Command line
The promptfoo
command line utility supports the following subcommands:
init [directory]
- Initialize a new project with dummy files.eval
- Evaluate prompts and models. This is the command you'll be using the most!view
- Start a browser UI for visualization of results.share
- Create a URL that can be shared online.auth
- Manage authentication for cloud features.cache
- Manage cache.cache clear
config
- Edit configuration settings.config get
config set
config unset
debug
- Display debug information for troubleshooting.generate
- Generate data.generate dataset
generate redteam
generate assertions
list
- List various resources like evaluations, prompts, and datasets.list evals
list prompts
list datasets
mcp
- Start a Model Context Protocol (MCP) server to expose promptfoo tools to AI agents and development environments.scan-model
- Scan ML models for security vulnerabilities.show <id>
- Show details of a specific resource (evaluation, prompt, dataset).delete <id>
- Delete a resource by its ID (currently, just evaluations)validate
- Validate a promptfoo configuration file.feedback <message>
- Send feedback to the Promptfoo developers.import <filepath>
- Import an eval file from JSON format.export
- Export eval records or logs.export eval <evalId>
export logs
redteam
- Red team LLM applications.redteam init
redteam setup
redteam run
redteam discover
redteam generate
redteam poison
redteam eval
redteam report
redteam plugins
Common Options
Most commands support the following common options:
Option | Description |
---|---|
--env-file, --env-path <path> | Path to .env file |
-v, --verbose | Show debug logs |
--help | Display help |
promptfoo eval
By default the eval
command will read the promptfooconfig.yaml
configuration file in your current directory. But, if you're looking to override certain parameters you can supply optional arguments:
Option | Description |
---|---|
-a, --assertions <path> | Path to assertions file |
-c, --config <paths...> | Path to configuration file(s). Automatically loads promptfooconfig.yaml |
--delay <number> | Delay between each test (in milliseconds) |
--description <description> | Description of the eval run |
--filter-failing <path or id> | Filter tests that failed in a previous evaluation (by file path or eval ID) |
--filter-errors-only <path or id> | Filter tests that resulted in errors in a previous evaluation |
-n, --filter-first-n <number> | Only run the first N tests |
--filter-sample <number> | Only run a random sample of N tests |
--filter-metadata <key=value> | Only run tests whose metadata matches the key=value pair |
--filter-pattern <pattern> | Only run tests whose description matches the regex pattern |
--filter-providers <providers> | Only run tests with these providers (regex match) |
--filter-targets <targets> | Only run tests with these targets (alias for --filter-providers) |
--grader <provider> | Model that will grade outputs |
-j, --max-concurrency <number> | Maximum number of concurrent API calls |
--model-outputs <path> | Path to JSON containing list of LLM output strings |
--no-cache | Do not read or write results to disk cache |
--no-progress-bar | Do not show progress bar |
--no-table | Do not output table in CLI |
--no-write | Do not write results to promptfoo directory |
--resume [evalId] | Resume a paused/incomplete evaluation. If evalId is omitted, resumes latest |
--retry-errors | Retry all ERROR results from the latest evaluation |
-o, --output <paths...> | Path(s) to output file (csv, txt, json, jsonl, yaml, yml, html, xml) |
-p, --prompts <paths...> | Paths to prompt files (.txt) |
--prompt-prefix <path> | Prefix prepended to every prompt |
--prompt-suffix <path> | Suffix appended to every prompt |
-r, --providers <name or path...> | Provider names or paths to custom API caller modules |
--remote | Force remote inference wherever possible (used for red teams) |
--repeat <number> | Number of times to run each test |
--share | Create a shareable URL |
--no-share | Do not create a shareable URL, this overrides the config file |
--suggest-prompts <number> | Generate N new prompts and append them to the prompt list |
--table | Output table in CLI |
--table-cell-max-length <number> | Truncate console table cells to this length |
-t, --tests <path> | Path to CSV with test cases |
--var <key=value> | Set a variable in key=value format |
-v, --vars <path> | Path to CSV with test cases (alias for --tests) |
-w, --watch | Watch for changes in config and re-run |
The eval
command will return exit code 100
when there is at least 1 test case failure or when the pass rate is below the threshold set by PROMPTFOO_PASS_RATE_THRESHOLD
. It will return exit code 1
for any other error. The exit code for failed tests can be overridden with environment variable PROMPTFOO_FAILED_TEST_EXIT_CODE
.
Pause and Resume
promptfoo eval --resume # resumes the latest evaluation
promptfoo eval --resume <evalId> # resumes a specific evaluation
- On resume, promptfoo reuses the original run's effective runtime options (e.g.,
--delay
,--no-cache
,--max-concurrency
,--repeat
), skips completed test/prompt pairs, ignores CLI flags that change test ordering to keep indices aligned, and disables watch mode.
Retry Errors
promptfoo eval --retry-errors # retries all ERROR results from the latest evaluation
- The retry errors feature automatically finds ERROR results from the latest evaluation, removes them from the database, and re-runs only those test cases. This is useful when evaluations fail due to temporary network issues, rate limits, or API errors.
- Cannot be used together with
--resume
or--no-write
flags. - Uses the original evaluation's configuration and runtime options to ensure consistency.
promptfoo init [directory]
Initialize a new project with dummy files.
Option | Description |
---|---|
directory | Directory to create files in |
--no-interactive | Do not run in interactive mode |
promptfoo view
Start a browser UI for visualization of results.
Option | Description |
---|---|
-p, --port <number> | Port number for the local server |
-y, --yes | Skip confirmation and auto-open the URL |
If you've used PROMPTFOO_CONFIG_DIR
to override the promptfoo output directory, run promptfoo view [directory]
.
promptfoo share [evalId]
Create a URL that can be shared online.
Option | Description |
---|---|
--show-auth | Include auth info in the shared URL |
-y, --yes | Skip confirmation before creating shareable URL |
promptfoo cache
Manage cache.
Option | Description |
---|---|
clear | Clear the cache |
promptfoo feedback <message>
Send feedback to the promptfoo developers.
Option | Description |
---|---|
message | Feedback message |
promptfoo list
List various resources like evaluations, prompts, and datasets.
Subcommand | Description |
---|---|
evals | List evaluations |
prompts | List prompts |
datasets | List datasets |
Option | Description |
---|---|
-n | Show the first n records, sorted by descending date of creation |
--ids-only | Show only IDs without descriptions |
promptfoo mcp
Start a Model Context Protocol (MCP) server to expose promptfoo's eval and testing capabilities as tools that AI agents and development environments can use.
Option | Description | Default |
---|---|---|
-p, --port <number> | Port number for HTTP transport | 3100 |
--transport <type> | Transport type: "http" or "stdio" | http |
Transport Types
- STDIO: Best for desktop AI tools like Cursor, Claude Desktop, and local AI agents that communicate via standard input/output
- HTTP: Best for web applications, APIs, and remote integrations that need HTTP endpoints
Examples
# Start MCP server with STDIO transport (for Cursor, Claude Desktop, etc.)
npx promptfoo@latest mcp --transport stdio
# Start MCP server with HTTP transport on default port
npx promptfoo@latest mcp --transport http
# Start MCP server with HTTP transport on custom port
npx promptfoo@latest mcp --transport http --port 8080
Available Tools
The MCP server provides 9 tools for AI agents:
Core Evaluation Tools:
list_evaluations
- Browse your evaluation runs with optional dataset filteringget_evaluation_details
- Get comprehensive results, metrics, and test cases for a specific evaluationrun_evaluation
- Execute evaluations with custom parameters, test case filtering, and concurrency controlshare_evaluation
- Generate publicly shareable URLs for evaluation results
Redteam Security Tools:
redteam_run
- Execute comprehensive security testing against AI applications with dynamic attack probesredteam_generate
- Generate adversarial test cases for redteam security testing with configurable plugins and strategies
Configuration & Testing:
validate_promptfoo_config
- Validate configuration files using the same logic as the CLItest_provider
- Test AI provider connectivity, credentials, and response qualityrun_assertion
- Test individual assertion rules against outputs for debugging
For detailed setup instructions and integration examples, see the MCP Server documentation.
promptfoo show <id>
Show details of a specific resource.
Option | Description |
---|---|
eval <id> | Show details of a specific evaluation |
prompt <id> | Show details of a specific prompt |
dataset <id> | Show details of a specific dataset |
promptfoo delete <id>
Deletes a specific resource.
Option | Description |
---|---|
eval <id> | Delete an evaluation by id |
promptfoo import <filepath>
Import an eval file from JSON format.
promptfoo export
Export eval records or logs.
promptfoo export eval <evalId>
Export an eval record to JSON format. To export the most recent, use evalId latest
.
Option | Description |
---|---|
-o, --output <filepath> | File to write. Writes to stdout by default. |
promptfoo export logs
Collect and zip log files for debugging purposes.
Option | Description |
---|---|
-n, --count <number> | Number of recent log files to include (default: all) |
-o, --output <filepath> | Output path for the compressed log file |
This command creates a compressed tar.gz archive containing your promptfoo log files, making it easy to share them for debugging purposes. If no output path is specified, it will generate a timestamped filename automatically.
promptfoo validate
Validate a promptfoo configuration file to ensure it follows the correct schema and structure.
Option | Description |
---|---|
-c, --config <paths...> | Path to configuration file(s). Automatically loads promptfooconfig.yaml |
This command validates both the configuration file and the test suite to ensure they conform to the expected schema. It will report any validation errors with detailed messages to help you fix configuration issues.
Examples:
# Validate the default promptfooconfig.yaml
promptfoo validate
# Validate a specific configuration file
promptfoo validate -c my-config.yaml
# Validate multiple configuration files
promptfoo validate -c config1.yaml config2.yaml
The command will exit with code 1
if validation fails, making it useful for CI/CD pipelines to catch configuration errors early.
promptfoo scan-model
Scan ML models for security vulnerabilities. Provide one or more paths to model files or directories.
Option | Description | Default |
---|---|---|
-b, --blacklist <pattern> | Additional blacklist patterns to check against model names | |
-f, --format <format> | Output format (text or json ) | text |
-o, --output <path> | Output file path (prints to stdout if not specified) | |
-t, --timeout <seconds> | Scan timeout in seconds | 300 |
--max-file-size <bytes> | Maximum file size to scan in bytes |
promptfoo auth
Manage authentication for cloud features.
promptfoo auth login
Login to the promptfoo cloud.
Option | Description |
---|---|
-o, --org <orgId> | The organization ID to login to |
-h, --host <host> | The host of the promptfoo instance (API URL if different from the app URL) |
-k, --api-key <key> | Login using an API key |
promptfoo auth logout
Logout from the promptfoo cloud.
promptfoo auth whoami
Show current user information.