See how your prompt performs across LLMs — side by side, in seconds.
promptside runs the same prompt across multiple language models and shows you a beautiful side-by-side comparison of outputs, token usage, latency, and cost. Built for the workflow every AI dev now has: "a new model dropped — did my prompts regress?"
npx promptside "Explain transformers to a 10-year-old" \
--models claude-opus-4-7,gpt-5,gemini-2.5-flashOutputs a side-by-side terminal view and a self-contained HTML report you can share.
Every time a new frontier model ships, you want to know:
- Does my prompt still work?
- Which model gives the best answer for my use case?
- What's the cost/latency tradeoff?
Existing tools (Promptfoo, Braintrust, etc.) are powerful but heavy — config files, eval frameworks, dashboards, signups. promptside is the opposite: one command, no signup, instant visual diff.
npm install -g promptside
# or run directly
npx promptsidepromptside "Write a haiku about debugging" \
--models claude-opus-4-7,gpt-5,gemini-2.5-flashCreate a .prompt.md file (see examples/ for more):
---
models:
- anthropic:claude-opus-4-7
- openai:gpt-5
- google:gemini-2.5-flash
max_tokens: 64
---
Write a haiku about debugging.Then:
promptside run examples/demo-haiku.prompt.mdRe-run automatically on file save:
promptside run myprompt.prompt.md --watchpromptside "your prompt" --models claude-opus-4-7,gpt-5 --html report.html
open report.htmlSet these in your environment:
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export GOOGLE_API_KEY=...promptside only calls the providers you actually use.
Note: Gemini's free tier has aggressive rate limits and may return 503/429 errors during peak demand. If you hit this, wait a few minutes or switch to a paid API key at aistudio.google.com/apikey.
Each run captures, per model:
- Full output text
- Input / output tokens
- Latency (ms)
- Cost (USD)
- Character-level diff against the other models' outputs
| promptside | Promptfoo | Braintrust | |
|---|---|---|---|
| Setup time | 30 seconds | ~10 min | Signup required |
| Config | Optional .prompt.md |
YAML eval files | Cloud dashboard |
| Local-first | ✅ | ✅ | ❌ |
| Visual diff | ✅ | ❌ | Partial |
| Eval framework | ❌ (by design) | ✅ | ✅ |
| Best for | Quick prompt comparisons | Full eval pipelines | Team prompt management |
promptside is the tool you reach for when a model drops and you want to know in 30 seconds whether your prompts still work. For full eval pipelines, use Promptfoo. For team workflows, use Braintrust.
- Anthropic, OpenAI, Google adapters
- Terminal + HTML renderers
-
.prompt.mdfiles with frontmatter - Watch mode
- Local model support (Ollama)
- Streaming output
- Variable substitution in prompt files
- CI mode (exit code on regression)
PRs welcome. Adapter contributions especially appreciated — see src/adapters/ for the pattern.
MIT
Built by @lucalouren. If promptside saves you time, a star helps a lot. ⭐
