Skip to main content

Promptfoo MCP Server

Expose promptfoo's eval tools to AI agents via Model Context Protocol (MCP).

Prerequisites
  • Node.js installed on your system
  • A promptfoo project with some evaluations (for testing the connection)
  • Cursor IDE, Claude Desktop, or another MCP-compatible AI tool

Quick Start​

1. Start the Server​

# For Cursor, Claude Desktop (STDIO transport)
npx promptfoo@latest mcp --transport stdio

# For web tools (HTTP transport)
npx promptfoo@latest mcp --transport http --port 3100

2. Configure Your AI Tool​

Cursor: Create .cursor/mcp.json in your project root

.cursor/mcp.json
{
"mcpServers": {
"promptfoo": {
"command": "npx",
"args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
"description": "Promptfoo MCP server for LLM evaluation and testing"
}
}
}
Development vs Production Configuration

For regular usage: Always use npx promptfoo@latest as shown above.

For promptfoo contributors: The .cursor/mcp.json in the promptfoo repository uses development commands (ts-node src/main.ts) to run from source code. Don't copy that configuration for regular usage.

Claude Desktop: Add to config file

Config file locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json
claude_desktop_config.json
{
"mcpServers": {
"promptfoo": {
"command": "npx",
"args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
"description": "Promptfoo MCP server for LLM evaluation and testing"
}
}
}

Restart your AI tool after adding the configuration.

3. Test the Connection​

After restarting your AI tool, you should see promptfoo tools available. Try asking:

"List my recent evaluations using the promptfoo tools"

Available Tools​

Core Evaluation Tools​

  • list_evaluations - Browse your evaluation runs with optional dataset filtering
  • get_evaluation_details - Get comprehensive results, metrics, and test cases for a specific evaluation
  • run_evaluation - Execute evaluations with custom parameters, test case filtering, and concurrency control
  • share_evaluation - Generate publicly shareable URLs for evaluation results

Generation Tools​

  • generate_dataset - Generate test datasets using AI for comprehensive evaluation coverage
  • generate_test_cases - Generate test cases with assertions for existing prompts
  • compare_providers - Compare multiple AI providers side-by-side for performance and quality

Redteam Security Tools​

  • redteam_run - Execute comprehensive security testing against AI applications with dynamic attack probes
  • redteam_generate - Generate adversarial test cases for redteam security testing with configurable plugins and strategies

Configuration & Testing​

  • validate_promptfoo_config - Validate configuration files using the same logic as the CLI
  • test_provider - Test AI provider connectivity, credentials, and response quality
  • run_assertion - Test individual assertion rules against outputs for debugging

Example Workflows​

1. Basic Evaluation Workflow​

Ask your AI assistant:

"Help me run an evaluation. First, validate my config, then list recent evaluations, and finally run a new evaluation with just the first 5 test cases."

The AI will use these tools in sequence:

  1. validate_promptfoo_config - Check your configuration
  2. list_evaluations - Show recent runs
  3. run_evaluation - Execute with test case filtering

2. Provider Comparison​

"Compare the performance of GPT-4, Claude 3, and Gemini Pro on my customer support prompt."

The AI will:

  1. test_provider - Verify each provider works
  2. compare_providers - Run side-by-side comparison
  3. Analyze results and provide recommendations

3. Security Testing​

"Run a security audit on my chatbot prompt to check for jailbreak vulnerabilities."

The AI will:

  1. redteam_generate - Create adversarial test cases
  2. redteam_run - Execute security tests
  3. get_evaluation_details - Analyze vulnerabilities found

4. Dataset Generation​

"Generate 20 diverse test cases for my email classification prompt, including edge cases."

The AI will:

  1. generate_dataset - Create test data with AI
  2. generate_test_cases - Add appropriate assertions
  3. run_evaluation - Test the generated cases

Transport Types​

Choose the appropriate transport based on your use case:

  • STDIO (--transport stdio): For desktop AI tools (Cursor, Claude Desktop) that communicate via stdin/stdout
  • HTTP (--transport http): For web applications, APIs, and remote integrations that need HTTP endpoints

Best Practices​

1. Start Small​

Begin with simple tools like list_evaluations and validate_promptfoo_config before moving to more complex operations.

2. Use Filtering​

When working with large datasets:

  • Filter evaluations by dataset ID
  • Use test case indices to run partial evaluations
  • Apply prompt/provider filters for focused testing

3. Iterative Testing​

  1. Validate configuration first
  2. Test providers individually before comparisons
  3. Run small evaluation subsets before full runs
  4. Review results with get_evaluation_details

4. Security First​

When using redteam tools:

  • Start with basic plugins before advanced attacks
  • Review generated test cases before running
  • Always analyze results thoroughly

Troubleshooting​

Server Issues​

Server won't start:

# Verify promptfoo installation
npx promptfoo@latest --version

# Check if you have a valid promptfoo project
npx promptfoo@latest validate

# Test the MCP server manually
npx promptfoo@latest mcp --transport stdio

Port conflicts (HTTP mode):

# Use a different port
npx promptfoo@latest mcp --transport http --port 8080

# Check what's using port 3100
lsof -i :3100 # macOS/Linux
netstat -ano | findstr :3100 # Windows

AI Tool Connection Issues​

AI tool can't connect:

  1. Verify config syntax: Ensure your JSON configuration exactly matches the examples above
  2. Check file paths: Confirm config files are in the correct locations
  3. Restart completely: Close your AI tool entirely and reopen it
  4. Test HTTP endpoint: For HTTP transport, verify with curl http://localhost:3100/health

Tools not appearing:

  1. Look for MCP or "tools" indicators in your AI tool's interface
  2. Try asking explicitly: "What promptfoo tools do you have access to?"
  3. Check your AI tool's logs for MCP connection errors

Tool-Specific Errors​

"Eval not found":

  • Use list_evaluations first to see available evaluation IDs
  • Ensure you're in a directory with promptfoo evaluation data

"Config error":

  • Run validate_promptfoo_config to check your configuration
  • Verify promptfooconfig.yaml exists and is valid

"Provider error":

  • Use test_provider to diagnose connectivity and authentication issues
  • Check your API keys and provider configurations

Advanced Usage​

Custom HTTP Integrations​

For HTTP transport, you can integrate with any system that supports HTTP:

// Example: Call MCP server from Node.js
const response = await fetch('http://localhost:3100/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
method: 'tools/call',
params: {
name: 'list_evaluations',
arguments: { datasetId: 'my-dataset' },
},
}),
});

Environment Variables​

The MCP server respects all promptfoo environment variables:

# Set provider API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Configure promptfoo behavior
export PROMPTFOO_CONFIG_DIR=/path/to/configs
export PROMPTFOO_OUTPUT_DIR=/path/to/outputs

# Start server with environment
npx promptfoo@latest mcp --transport stdio

Resources​