Skip to content

lucalouren/Promptside

Repository files navigation

promptside

See how your prompt performs across LLMs — side by side, in seconds.

promptside runs the same prompt across multiple language models and shows you a beautiful side-by-side comparison of outputs, token usage, latency, and cost. Built for the workflow every AI dev now has: "a new model dropped — did my prompts regress?"

npx promptside "Explain transformers to a 10-year-old" \
  --models claude-opus-4-7,gpt-5,gemini-2.5-flash

Outputs a side-by-side terminal view and a self-contained HTML report you can share.

promptside demo

Why

Every time a new frontier model ships, you want to know:

  • Does my prompt still work?
  • Which model gives the best answer for my use case?
  • What's the cost/latency tradeoff?

Existing tools (Promptfoo, Braintrust, etc.) are powerful but heavy — config files, eval frameworks, dashboards, signups. promptside is the opposite: one command, no signup, instant visual diff.

Install

npm install -g promptside
# or run directly
npx promptside

Usage

Quick comparison

promptside "Write a haiku about debugging" \
  --models claude-opus-4-7,gpt-5,gemini-2.5-flash

From a prompt file

Create a .prompt.md file (see examples/ for more):

---
models:
  - anthropic:claude-opus-4-7
  - openai:gpt-5
  - google:gemini-2.5-flash
max_tokens: 64
---

Write a haiku about debugging.

Then:

promptside run examples/demo-haiku.prompt.md

Watch mode

Re-run automatically on file save:

promptside run myprompt.prompt.md --watch

HTML report

promptside "your prompt" --models claude-opus-4-7,gpt-5 --html report.html
open report.html

API keys

Set these in your environment:

export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export GOOGLE_API_KEY=...

promptside only calls the providers you actually use.

Note: Gemini's free tier has aggressive rate limits and may return 503/429 errors during peak demand. If you hit this, wait a few minutes or switch to a paid API key at aistudio.google.com/apikey.

Output

Each run captures, per model:

  • Full output text
  • Input / output tokens
  • Latency (ms)
  • Cost (USD)
  • Character-level diff against the other models' outputs

Comparison

promptside Promptfoo Braintrust
Setup time 30 seconds ~10 min Signup required
Config Optional .prompt.md YAML eval files Cloud dashboard
Local-first
Visual diff Partial
Eval framework ❌ (by design)
Best for Quick prompt comparisons Full eval pipelines Team prompt management

promptside is the tool you reach for when a model drops and you want to know in 30 seconds whether your prompts still work. For full eval pipelines, use Promptfoo. For team workflows, use Braintrust.

Roadmap

  • Anthropic, OpenAI, Google adapters
  • Terminal + HTML renderers
  • .prompt.md files with frontmatter
  • Watch mode
  • Local model support (Ollama)
  • Streaming output
  • Variable substitution in prompt files
  • CI mode (exit code on regression)

Contributing

PRs welcome. Adapter contributions especially appreciated — see src/adapters/ for the pattern.

License

MIT


Built by @lucalouren. If promptside saves you time, a star helps a lot. ⭐

About

Run the same prompt across multiple LLMs and diff the results, side-by-side.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors