Test Case Configuration

Define evaluation scenarios with variables, assertions, and test data.

Inline Tests

The simplest way to define tests is directly in your config:

promptfooconfig.yaml
tests:
  - vars:
      question: 'What is the capital of France?'
    assert:
      - type: contains
        value: 'Paris'

  - vars:
      question: 'What is 2 + 2?'
    assert:
      - type: equals
        value: '4'

Test Structure

Each test case can include:

tests:
  - description: 'Optional test description'
    vars:
      # Variables to substitute in prompts
      var1: value1
      var2: value2
    assert:
      # Expected outputs and validations
      - type: contains
        value: 'expected text'
    metadata:
      # Filterable metadata
      category: math
      difficulty: easy

External Test Files

For larger test suites, store tests in separate files:

promptfooconfig.yaml
tests: file://tests.yaml

Or load multiple files:

tests:
  - file://basic_tests.yaml
  - file://advanced_tests.yaml
  - file://edge_cases/*.yaml

CSV Format

CSV is ideal for bulk test data:

promptfooconfig.yaml
tests: file://test_cases.csv

Basic CSV

test_cases.csv
question,expectedAnswer
"What is 2+2?","4"
"What is the capital of France?","Paris"
"Who wrote Romeo and Juliet?","Shakespeare"

Variables are automatically mapped from column headers.

CSV with Assertions

Use special __expected columns for assertions:

test_cases.csv
input,__expected
"Hello world","contains: Hello"
"Calculate 5 * 6","equals: 30"
"What's the weather?","llm-rubric: Provides weather information"

Multiple assertions:

test_cases.csv
question,__expected1,__expected2,__expected3
"What is 2+2?","equals: 4","contains: four","javascript: output.length < 10"

Special CSV Columns

Column	Purpose	Example
`__expected`	Single assertion	`contains: Paris`
`__expected1`, `__expected2`, ...	Multiple assertions	`equals: 42`
`__description`	Test description	`Basic math test`
`__prefix`	Prepend to prompt	`You must answer:`
`__suffix`	Append to prompt	`(be concise)`
`__metric`	Metric name for assertions	`accuracy`
`__threshold`	Pass threshold	`0.8`
`__metadata:*`	Filterable metadata	See below

Using __metadata without a key is not supported. Specify the metadata field like __metadata:category. If a CSV file includes a __metadata column without a key, Promptfoo logs a warning and ignores the column.

Metadata in CSV

Add filterable metadata:

test_cases.csv
question,__expected,__metadata:category,__metadata:difficulty
"What is 2+2?","equals: 4","math","easy"
"Explain quantum physics","llm-rubric: Accurate explanation","science","hard"

Array metadata with []:

topic,__metadata:tags[]
"Machine learning","ai,technology,data science"
"Climate change","environment,science,global\,warming"

Filter tests:

promptfoo eval --filter-metadata category=math
promptfoo eval --filter-metadata difficulty=easy
promptfoo eval --filter-metadata tags=ai

JSON in CSV

Include structured data:

test_cases.csv
query,context,__expected
"What's the temperature?","{""location"":""NYC"",""units"":""celsius""}","contains: celsius"

Access in prompts:

prompts:
  - 'Query: {{query}}, Location: {{(context | load).location}}'

Dynamic Test Generation

Generate tests programmatically:

JavaScript/TypeScript

promptfooconfig.yaml
tests: file://generate_tests.js

generate_tests.js
module.exports = async function () {
  // Fetch data, compute test cases, etc.
  const testCases = [];

  for (let i = 1; i <= 10; i++) {
    testCases.push({
      description: `Test case ${i}`,
      vars: {
        number: i,
        squared: i * i,
      },
      assert: [
        {
          type: 'contains',
          value: String(i * i),
        },
      ],
    });
  }

  return testCases;
};

Python

promptfooconfig.yaml
tests: file://generate_tests.py:create_tests

generate_tests.py
import json

def create_tests():
    test_cases = []

    # Load test data from database, API, etc.
    test_data = load_test_data()

    for item in test_data:
        test_cases.append({
            "vars": {
                "input": item["input"],
                "context": item["context"]
            },
            "assert": [{
                "type": "contains",
                "value": item["expected"]
            }]
        })

    return test_cases

With Configuration

Pass configuration to generators:

promptfooconfig.yaml
tests:
  - path: file://generate_tests.py:create_tests
    config:
      dataset: 'validation'
      category: 'math'
      sample_size: 100

generate_tests.py
def create_tests(config):
    dataset = config.get('dataset', 'train')
    category = config.get('category', 'all')
    size = config.get('sample_size', 50)

    # Use configuration to generate tests
    return generate_test_cases(dataset, category, size)

JSON/JSONL Format

JSON Array

tests.json
[
  {
    "vars": {
      "topic": "artificial intelligence"
    },
    "assert": [
      {
        "type": "contains",
        "value": "AI"
      }
    ]
  },
  {
    "vars": {
      "topic": "climate change"
    },
    "assert": [
      {
        "type": "llm-rubric",
        "value": "Discusses environmental impact"
      }
    ]
  }
]

JSONL (One test per line)

tests.jsonl
{"vars": {"x": 5, "y": 3}, "assert": [{"type": "equals", "value": "8"}]}
{"vars": {"x": 10, "y": 7}, "assert": [{"type": "equals", "value": "17"}]}

Loading Media Files

Include images, PDFs, and other files as variables:

promptfooconfig.yaml
tests:
  - vars:
      image: file://images/chart.png
      document: file://docs/report.pdf
      data: file://data/config.yaml

Supported File Types

Type	Handling	Usage
Images (png, jpg, etc.)	Converted to base64	Vision models
Videos (mp4, etc.)	Converted to base64	Multimodal models
PDFs	Text extraction	Document analysis
Text files	Loaded as string	Any use case
YAML/JSON	Parsed to object	Structured data

Example: Vision Model Test

tests:
  - vars:
      image: file://test_image.jpg
      question: 'What objects are in this image?'
    assert:
      - type: contains
        value: 'dog'

In your prompt:

[
  {
    "role": "user",
    "content": [
      { "type": "text", "text": "{{question}}" },
      {
        "type": "image_url",
        "image_url": {
          "url": "data:image/jpeg;base64,{{image}}"
        }
      }
    ]
  }
]

Best Practices

1. Organize Test Data

project/
├── promptfooconfig.yaml
├── prompts/
│   └── main_prompt.txt
└── tests/
    ├── basic_functionality.csv
    ├── edge_cases.yaml
    └── regression_tests.json

2. Use Descriptive Names

tests:
  - description: 'Test French translation with formal tone'
    vars:
      text: 'Hello'
      language: 'French'
      tone: 'formal'

# Use metadata for organization
tests:
  - vars:
      query: 'Reset password'
    metadata:
      feature: authentication
      priority: high

4. Combine Approaches

tests:
  # Quick smoke tests inline
  - vars:
      test: 'quick check'

  # Comprehensive test suite from file
  - file://tests/full_suite.csv

  # Dynamic edge case generation
  - file://tests/generate_edge_cases.js

Common Patterns

A/B Testing Variables

ab_tests.csv
message_style,greeting,__expected
"formal","Good morning","contains: Good morning"
"casual","Hey there","contains: Hey"
"friendly","Hello!","contains: Hello"

Error Handling Tests

tests:
  - description: 'Handle empty input'
    vars:
      input: ''
    assert:
      - type: contains
        value: 'provide more information'

Performance Tests

tests:
  - vars:
      prompt: 'Simple question'
    assert:
      - type: latency
        threshold: 1000 # milliseconds

Loading from Google Sheets

See Google Sheets integration for details on loading test data directly from spreadsheets.

Loading from HuggingFace datasets

See HuggingFace Datasets for instructions on importing test cases from existing datasets.

Inline Tests​

Test Structure​

External Test Files​

CSV Format​

Basic CSV​

CSV with Assertions​

Special CSV Columns​

Metadata in CSV​

JSON in CSV​

Dynamic Test Generation​

JavaScript/TypeScript​

Python​

With Configuration​

JSON/JSONL Format​

JSON Array​

JSONL (One test per line)​

Loading Media Files​

Supported File Types​

Example: Vision Model Test​

Best Practices​

1. Organize Test Data​

2. Use Descriptive Names​

3. Group Related Tests​

4. Combine Approaches​

Common Patterns​

A/B Testing Variables​

Error Handling Tests​

Performance Tests​

Loading from Google Sheets​

Loading from HuggingFace datasets​