Promptfoo MCP Server

Expose promptfoo's eval tools to AI agents via Model Context Protocol (MCP).

Prerequisites

Node.js installed on your system
A promptfoo project with some evaluations (for testing the connection)
Cursor IDE, Claude Desktop, or another MCP-compatible AI tool

Quick Start

1. Start the Server

# For Cursor, Claude Desktop (STDIO transport)
npx promptfoo@latest mcp --transport stdio

# For web tools (HTTP transport)
npx promptfoo@latest mcp --transport http --port 3100

2. Configure Your AI Tool

Cursor: Create .cursor/mcp.json in your project root

.cursor/mcp.json
{
  "mcpServers": {
    "promptfoo": {
      "command": "npx",
      "args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
      "description": "Promptfoo MCP server for LLM evaluation and testing"
    }
  }
}

Development vs Production Configuration

For regular usage: Always use npx promptfoo@latest as shown above.

For promptfoo contributors: The .cursor/mcp.json in the promptfoo repository uses development commands (ts-node src/main.ts) to run from source code. Don't copy that configuration for regular usage.

Claude Desktop: Add to config file

Config file locations:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

claude_desktop_config.json
{
  "mcpServers": {
    "promptfoo": {
      "command": "npx",
      "args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
      "description": "Promptfoo MCP server for LLM evaluation and testing"
    }
  }
}

Restart your AI tool after adding the configuration.

3. Test the Connection

After restarting your AI tool, you should see promptfoo tools available. Try asking:

"List my recent evaluations using the promptfoo tools"

Available Tools

Core Evaluation Tools

list_evaluations - Browse your evaluation runs with optional dataset filtering
get_evaluation_details - Get comprehensive results, metrics, and test cases for a specific evaluation
run_evaluation - Execute evaluations with custom parameters, test case filtering, and concurrency control
share_evaluation - Generate publicly shareable URLs for evaluation results

Generation Tools

generate_dataset - Generate test datasets using AI for comprehensive evaluation coverage
generate_test_cases - Generate test cases with assertions for existing prompts
compare_providers - Compare multiple AI providers side-by-side for performance and quality

Redteam Security Tools

redteam_run - Execute comprehensive security testing against AI applications with dynamic attack probes
redteam_generate - Generate adversarial test cases for redteam security testing with configurable plugins and strategies

Configuration & Testing

validate_promptfoo_config - Validate configuration files using the same logic as the CLI
test_provider - Test AI provider connectivity, credentials, and response quality
run_assertion - Test individual assertion rules against outputs for debugging

Example Workflows

1. Basic Evaluation Workflow

Ask your AI assistant:

"Help me run an evaluation. First, validate my config, then list recent evaluations, and finally run a new evaluation with just the first 5 test cases."

The AI will use these tools in sequence:

validate_promptfoo_config - Check your configuration
list_evaluations - Show recent runs
run_evaluation - Execute with test case filtering

2. Provider Comparison

"Compare the performance of GPT-4, Claude 3, and Gemini Pro on my customer support prompt."

The AI will:

test_provider - Verify each provider works
compare_providers - Run side-by-side comparison
Analyze results and provide recommendations

3. Security Testing

"Run a security audit on my chatbot prompt to check for jailbreak vulnerabilities."

The AI will:

redteam_generate - Create adversarial test cases
redteam_run - Execute security tests
get_evaluation_details - Analyze vulnerabilities found

4. Dataset Generation

"Generate 20 diverse test cases for my email classification prompt, including edge cases."

The AI will:

generate_dataset - Create test data with AI
generate_test_cases - Add appropriate assertions
run_evaluation - Test the generated cases

Transport Types

Choose the appropriate transport based on your use case:

STDIO (--transport stdio): For desktop AI tools (Cursor, Claude Desktop) that communicate via stdin/stdout
HTTP (--transport http): For web applications, APIs, and remote integrations that need HTTP endpoints

Best Practices

1. Start Small

Begin with simple tools like list_evaluations and validate_promptfoo_config before moving to more complex operations.

2. Use Filtering

When working with large datasets:

Filter evaluations by dataset ID
Use test case indices to run partial evaluations
Apply prompt/provider filters for focused testing

3. Iterative Testing

Validate configuration first
Test providers individually before comparisons
Run small evaluation subsets before full runs
Review results with get_evaluation_details

4. Security First

When using redteam tools:

Start with basic plugins before advanced attacks
Review generated test cases before running
Always analyze results thoroughly

Troubleshooting

Server Issues

Server won't start:

# Verify promptfoo installation
npx promptfoo@latest --version

# Check if you have a valid promptfoo project
npx promptfoo@latest validate

# Test the MCP server manually
npx promptfoo@latest mcp --transport stdio

Port conflicts (HTTP mode):

# Use a different port
npx promptfoo@latest mcp --transport http --port 8080

# Check what's using port 3100
lsof -i :3100  # macOS/Linux
netstat -ano | findstr :3100  # Windows

AI Tool Connection Issues

AI tool can't connect:

Verify config syntax: Ensure your JSON configuration exactly matches the examples above
Check file paths: Confirm config files are in the correct locations
Restart completely: Close your AI tool entirely and reopen it
Test HTTP endpoint: For HTTP transport, verify with curl http://localhost:3100/health

Tools not appearing:

Look for MCP or "tools" indicators in your AI tool's interface
Try asking explicitly: "What promptfoo tools do you have access to?"
Check your AI tool's logs for MCP connection errors

Tool-Specific Errors

"Eval not found":

Use list_evaluations first to see available evaluation IDs
Ensure you're in a directory with promptfoo evaluation data

"Config error":

Run validate_promptfoo_config to check your configuration
Verify promptfooconfig.yaml exists and is valid

"Provider error":

Use test_provider to diagnose connectivity and authentication issues
Check your API keys and provider configurations

Advanced Usage

Custom HTTP Integrations

For HTTP transport, you can integrate with any system that supports HTTP:

// Example: Call MCP server from Node.js
const response = await fetch('http://localhost:3100/mcp', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    method: 'tools/call',
    params: {
      name: 'list_evaluations',
      arguments: { datasetId: 'my-dataset' },
    },
  }),
});

Environment Variables

The MCP server respects all promptfoo environment variables:

# Set provider API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Configure promptfoo behavior
export PROMPTFOO_CONFIG_DIR=/path/to/configs
export PROMPTFOO_OUTPUT_DIR=/path/to/outputs

# Start server with environment
npx promptfoo@latest mcp --transport stdio

Quick Start​

1. Start the Server​

2. Configure Your AI Tool​

3. Test the Connection​

Available Tools​

Core Evaluation Tools​

Generation Tools​

Redteam Security Tools​

Configuration & Testing​

Example Workflows​

1. Basic Evaluation Workflow​

2. Provider Comparison​

3. Security Testing​

4. Dataset Generation​

Transport Types​

Best Practices​

1. Start Small​

2. Use Filtering​

3. Iterative Testing​

4. Security First​

Troubleshooting​

Server Issues​

AI Tool Connection Issues​

Tool-Specific Errors​

Advanced Usage​

Custom HTTP Integrations​

Environment Variables​

Resources​

Quick Start

1. Start the Server

2. Configure Your AI Tool

3. Test the Connection

Available Tools

Core Evaluation Tools

Generation Tools

Redteam Security Tools

Configuration & Testing

Example Workflows

1. Basic Evaluation Workflow

2. Provider Comparison

3. Security Testing

4. Dataset Generation

Transport Types

Best Practices

1. Start Small

2. Use Filtering

3. Iterative Testing

4. Security First

Troubleshooting

Server Issues

AI Tool Connection Issues

Tool-Specific Errors

Advanced Usage

Custom HTTP Integrations

Environment Variables

Resources