Build reliable prompts, RAGs, and agents

Start testing the performance of your models, prompts, and tools in minutes:

npx promptfoo@latest init

Promptfoo runs locally and integrates directly with your app - no SDKs, cloud dependencies, or logins.

Comprehensive security coverage

Custom probes for your application that identify failures you actually care about, not just generic jailbreaks and prompt injections.

Built for developers

Move quickly with a command-line interface, live reloads, and caching. No SDKs, cloud dependencies, or logins.

Battle-tested, 100% open-source

Used by teams serving millions of users and supported by an active open-source community.

Easy abstractions for complex LLM testing

Simple declarative config

# Compare prompts...
- "Summarize this in {{language}}: {{document}}"
- "Summarize this in {{language}}, concisely and professionally: {{document}}"

# And models...
- openai:gpt-4o
- anthropic:claude-3.5-sonnet

# ... using these tests
- vars:
language: French
document: "file://docs/*.txt"
- type: contains
value: "foo bar"
- type: llm-rubric
value: "does not apologize"
- type: cost
threshold: 0.01
- type: latency
threshold: 3000
- # ...

