Test Case Configuration
Define evaluation scenarios with variables, assertions, and test data.
Inline Tests
The simplest way to define tests is directly in your config:
promptfooconfig.yaml
tests:
- vars:
question: 'What is the capital of France?'
assert:
- type: contains
value: 'Paris'
- vars:
question: 'What is 2 + 2?'
assert:
- type: equals
value: '4'
Test Structure
Each test case can include:
tests:
- description: 'Optional test description'
vars:
# Variables to substitute in prompts
var1: value1
var2: value2
assert:
# Expected outputs and validations
- type: contains
value: 'expected text'
metadata:
# Filterable metadata
category: math
difficulty: easy
External Test Files
For larger test suites, store tests in separate files:
promptfooconfig.yaml
tests: file://tests.yaml
Or load multiple files:
tests:
- file://basic_tests.yaml
- file://advanced_tests.yaml
- file://edge_cases/*.yaml
CSV Format
CSV is ideal for bulk test data:
promptfooconfig.yaml
tests: file://test_cases.csv
Basic CSV
test_cases.csv
question,expectedAnswer
"What is 2+2?","4"
"What is the capital of France?","Paris"
"Who wrote Romeo and Juliet?","Shakespeare"
Variables are automatically mapped from column headers.
CSV with Assertions
Use special __expected
columns for assertions:
test_cases.csv
input,__expected
"Hello world","contains: Hello"
"Calculate 5 * 6","equals: 30"
"What's the weather?","llm-rubric: Provides weather information"
Multiple assertions:
test_cases.csv
question,__expected1,__expected2,__expected3
"What is 2+2?","equals: 4","contains: four","javascript: output.length < 10"
Special CSV Columns
Column | Purpose | Example |
---|---|---|
__expected | Single assertion | contains: Paris |
__expected1 , __expected2 , ... | Multiple assertions | equals: 42 |
__description | Test description | Basic math test |
__prefix | Prepend to prompt | You must answer: |
__suffix | Append to prompt | (be concise) |
__metric | Metric name for assertions | accuracy |
__threshold | Pass threshold | 0.8 |
__metadata:* | Filterable metadata | See below |
Metadata in CSV
Add filterable metadata:
test_cases.csv
question,__expected,__metadata:category,__metadata:difficulty
"What is 2+2?","equals: 4","math","easy"
"Explain quantum physics","llm-rubric: Accurate explanation","science","hard"
Array metadata with []
:
topic,__metadata:tags[]
"Machine learning","ai,technology,data science"
"Climate change","environment,science,global\,warming"
Filter tests:
promptfoo eval --filter-metadata category=math
promptfoo eval --filter-metadata difficulty=easy
promptfoo eval --filter-metadata tags=ai
JSON in CSV
Include structured data:
test_cases.csv
query,context,__expected
"What's the temperature?","{""location"":""NYC"",""units"":""celsius""}","contains: celsius"
Access in prompts:
prompts:
- 'Query: {{query}}, Location: {{(context | load).location}}'
Dynamic Test Generation
Generate tests programmatically:
JavaScript/TypeScript
promptfooconfig.yaml
tests: file://generate_tests.js
generate_tests.js
module.exports = async function () {
// Fetch data, compute test cases, etc.
const testCases = [];
for (let i = 1; i <= 10; i++) {
testCases.push({
description: `Test case ${i}`,
vars: {
number: i,
squared: i * i,
},
assert: [
{
type: 'contains',
value: String(i * i),
},
],
});
}
return testCases;
};
Python
promptfooconfig.yaml
tests: file://generate_tests.py:create_tests
generate_tests.py
import json
def create_tests():
test_cases = []
# Load test data from database, API, etc.
test_data = load_test_data()
for item in test_data:
test_cases.append({
"vars": {
"input": item["input"],
"context": item["context"]
},
"assert": [{
"type": "contains",
"value": item["expected"]
}]
})
return test_cases
With Configuration
Pass configuration to generators:
promptfooconfig.yaml
tests:
- path: file://generate_tests.py:create_tests
config:
dataset: 'validation'
category: 'math'
sample_size: 100
generate_tests.py
def create_tests(config):
dataset = config.get('dataset', 'train')
category = config.get('category', 'all')
size = config.get('sample_size', 50)
# Use configuration to generate tests
return generate_test_cases(dataset, category, size)
JSON/JSONL Format
JSON Array
tests.json
[
{
"vars": {
"topic": "artificial intelligence"
},
"assert": [
{
"type": "contains",
"value": "AI"
}
]
},
{
"vars": {
"topic": "climate change"
},
"assert": [
{
"type": "llm-rubric",
"value": "Discusses environmental impact"
}
]
}
]
JSONL (One test per line)
tests.jsonl
{"vars": {"x": 5, "y": 3}, "assert": [{"type": "equals", "value": "8"}]}
{"vars": {"x": 10, "y": 7}, "assert": [{"type": "equals", "value": "17"}]}
Loading Media Files
Include images, PDFs, and other files as variables:
promptfooconfig.yaml
tests:
- vars:
image: file://images/chart.png
document: file://docs/report.pdf
data: file://data/config.yaml
Supported File Types
Type | Handling | Usage |
---|---|---|
Images (png, jpg, etc.) | Converted to base64 | Vision models |
Videos (mp4, etc.) | Converted to base64 | Multimodal models |
PDFs | Text extraction | Document analysis |
Text files | Loaded as string | Any use case |
YAML/JSON | Parsed to object | Structured data |
Example: Vision Model Test
tests:
- vars:
image: file://test_image.jpg
question: 'What objects are in this image?'
assert:
- type: contains
value: 'dog'
In your prompt:
[
{
"role": "user",
"content": [
{ "type": "text", "text": "{{question}}" },
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,{{image}}"
}
}
]
}
]
Best Practices
1. Organize Test Data
project/
├── promptfooconfig.yaml
├── prompts/
│ └── main_prompt.txt
└── tests/
├── basic_functionality.csv
├── edge_cases.yaml
└── regression_tests.json
2. Use Descriptive Names
tests:
- description: 'Test French translation with formal tone'
vars:
text: 'Hello'
language: 'French'
tone: 'formal'
3. Group Related Tests
# Use metadata for organization
tests:
- vars:
query: 'Reset password'
metadata:
feature: authentication
priority: high
4. Combine Approaches
tests:
# Quick smoke tests inline
- vars:
test: 'quick check'
# Comprehensive test suite from file
- file://tests/full_suite.csv
# Dynamic edge case generation
- file://tests/generate_edge_cases.js
Common Patterns
A/B Testing Variables
ab_tests.csv
message_style,greeting,__expected
"formal","Good morning","contains: Good morning"
"casual","Hey there","contains: Hey"
"friendly","Hello!","contains: Hello"
Error Handling Tests
tests:
- description: 'Handle empty input'
vars:
input: ''
assert:
- type: contains
value: 'provide more information'
Performance Tests
tests:
- vars:
prompt: 'Simple question'
assert:
- type: latency
threshold: 1000 # milliseconds
Loading from Google Sheets
See Google Sheets integration for details on loading test data directly from spreadsheets.