Python

Promptfoo is written in TypeScript and runs via Node.js, but it has first-class Python support. You can use Python for any part of your eval pipeline without writing JavaScript.

Use Python for:

Providers: call custom models, wrap APIs, run Hugging Face/PyTorch
Assertions: validate outputs with custom scoring logic
Test generators: load test cases from databases, APIs, or generate them programmatically
Prompts: build prompts dynamically based on test variables
Framework integrations: test LangChain, LangGraph, CrewAI, and other agent frameworks

The file:// prefix tells promptfoo to execute a Python function. Promptfoo automatically detects your Python installation.

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
  - file://prompts.py:create_prompt # Python generates the prompt

providers:
  - file://provider.py # Python calls the model

tests:
  - file://tests.py:generate_tests # Python generates test cases

defaultTest:
  assert:
    - type: python # Python validates the output
      value: file://assert.py:check

View source

from openai import OpenAI

client = OpenAI()

def call_api(prompt, options, context):
  response = client.responses.create(
      model="gpt-5.1-mini",
      input=prompt,
  )
  return {"output": response.output_text}

npx promptfoo@latest init --example python-provider

Providers

Use file:// to reference a Python file:

providers:
  - file://provider.py # Uses call_api() by default
  - file://provider.py:custom_function # Specify a function name

Your function receives three arguments and returns a dict:

provider.py
def call_api(prompt, options, context):  # or: async def call_api(...)
    # prompt: string or JSON-encoded messages
    # options: {"config": {...}} from YAML
    # context: {"vars": {...}} from test case

    return {
        "output": "response text",
        # Optional:
        "tokenUsage": {"total": 100, "prompt": 20, "completion": 80},
        "cost": 0.001,
    }

→ Provider documentation

Assertions

Use type: python to run custom validation:

assert:
  # Inline expression (returns bool or float 0-1)
  - type: python
    value: "'keyword' in output.lower()"

  # External file
  - type: python
    value: file://assert.py

For external files, define a get_assert function:

assert.py
def get_assert(output, context):
    # Return bool, float (0-1), or detailed result
    return {
        "pass": True,
        "score": 0.9,
        "reason": "Meets criteria",
    }

→ Assertions documentation

Test Generators

Load or generate test cases from Python:

tests:
  - file://tests.py:generate_tests

tests.py
def generate_tests(config=None):
    # Load from database, API, files, etc.
    return [
        {"vars": {"input": "test 1"}, "assert": [{"type": "contains", "value": "expected"}]},
        {"vars": {"input": "test 2"}},
    ]

Pass configuration from YAML:

tests:
  - path: file://tests.py:generate_tests
    config:
      max_cases: 100
      category: 'safety'

→ Test case documentation

Prompts

Build prompts dynamically:

prompts:
  - file://prompts.py:create_prompt

prompts.py
def create_prompt(context):
    # Return string or chat messages
    return [
        {"role": "system", "content": "You are an expert."},
        {"role": "user", "content": f"Explain {context['vars']['topic']}"},
    ]

→ Prompts documentation

Framework Integrations

Test Python agent frameworks by wrapping them as providers:

Framework	Example	Guide
LangGraph	`langgraph`	Evaluate LangGraph agents
LangChain	`langchain-python`	Test LLM chains
CrewAI	`crewai`	Evaluate CrewAI agents
OpenAI Agents	`openai-agents`	Multi-turn agent workflows
PydanticAI	`pydantic-ai`	Type-safe agents with Pydantic
Google ADK	`google-adk-example`	Google Agent Development Kit
Strands Agents	`strands-agents`	AWS open-source agent framework

To get started with any example:

npx promptfoo@latest init --example langgraph

Jupyter / Colab

# Install
!npm install -g promptfoo

# Create config
%%writefile promptfooconfig.yaml
prompts:
  - "Explain {{topic}}"
providers:
  - openai:gpt-4.1-mini
tests:
  - vars:
      topic: machine learning

# Run
!npx promptfoo eval

Open in Google Colab

Configuration

Python Path

Set a custom Python executable:

export PROMPTFOO_PYTHON=/path/to/python3

Or configure per-provider in YAML:

providers:
  - id: file://provider.py
    config:
      pythonExecutable: ./venv/bin/python

Module Paths

Add directories to the Python path:

export PYTHONPATH=/path/to/modules:$PYTHONPATH

Debugging

Enable debug output to see Python execution details:

LOG_LEVEL=debug npx promptfoo eval

Troubleshooting

See Python provider troubleshooting for common issues like Python not found, module import errors, and timeout problems.

Providers​

Assertions​

Test Generators​

Prompts​

Framework Integrations​

Jupyter / Colab​

Configuration​

Python Path​

Module Paths​

Debugging​

Troubleshooting​