Skip to main content

OpenAI Agents

Test multi-turn agentic workflows built with the @openai/agents SDK. Evaluate agents that use tools, hand off between specialists, and handle multi-step tasks.

Prerequisites

  • Install SDK: npm install @openai/agents
  • Set OPENAI_API_KEY environment variable
  • Agent definition (inline or in a TypeScript/JavaScript file)

Basic Usage

providers:
- openai:agents:my-agent
config:
agent:
name: Customer Support Agent
model: gpt-5-mini
instructions: You are a helpful customer support agent.
maxTurns: 10

Configuration Options

ParameterDescriptionDefault
agentAgent definition (inline object or file://path)-
toolsTool definitions (inline array or file://path)-
handoffsAgent handoff definitions (inline array or file://path)-
maxTurnsMaximum conversation turns10
modelOverride model specified in agent definition-
modelSettingsModel parameters (temperature, topP, maxTokens)-
inputGuardrailsInput validation guardrails (inline array or file://)-
outputGuardrailsOutput validation guardrails (inline array or file://)-
tracingEnable OpenTelemetry OTLP tracingfalse
otlpEndpointCustom OTLP endpoint URL for tracinghttp://localhost:4318

File-Based Configuration

Load agent and tools from external files:

providers:
- openai:agents:support-agent
config:
agent: file://./agents/support-agent.ts
tools: file://./tools/support-tools.ts
maxTurns: 15
tracing: true

Example agent file (agents/support-agent.ts):

import { Agent } from '@openai/agents';

export default new Agent({
name: 'Support Agent',
model: 'gpt-5-mini',
instructions: 'You are a helpful customer support agent.',
});

Example tools file (tools/support-tools.ts):

import { tool } from '@openai/agents';
import { z } from 'zod';

export const lookupOrder = tool({
name: 'lookup_order',
description: 'Look up order status by order ID',
parameters: z.object({
order_id: z.string().describe('The order ID'),
}),
execute: async ({ order_id }) => {
return { status: 'shipped', tracking: 'ABC123' };
},
});

export default [lookupOrder];

Agent Handoffs

Transfer conversations between specialized agents:

providers:
- openai:agents:triage
config:
agent:
name: Triage Agent
model: gpt-5-mini
instructions: Route questions to the appropriate specialist.
handoffs:
- agent:
name: Technical Support
model: gpt-5-mini
instructions: Handle technical troubleshooting.
description: Transfer for technical issues

Guardrails

Validate tool inputs and outputs with guardrails (added in SDK v0.3.8):

providers:
- openai:agents:secure-agent
config:
agent: file://./agents/secure-agent.ts
inputGuardrails: file://./guardrails/input-guardrails.ts
outputGuardrails: file://./guardrails/output-guardrails.ts

Guardrails run validation logic before tool execution (input) and after (output), enabling content filtering, PII detection, or custom business rules.

Tracing

Enable OpenTelemetry tracing to debug agent execution:

providers:
- openai:agents:my-agent
config:
agent: file://./agents/my-agent.ts
tracing: true # Exports to http://localhost:4318

With a custom OTLP endpoint:

providers:
- openai:agents:my-agent
config:
agent: file://./agents/my-agent.ts
tracing: true
otlpEndpoint: https://otel-collector.example.com:4318

Or enable globally:

export PROMPTFOO_TRACING_ENABLED=true
npx promptfoo eval

Traces include agent execution spans, tool invocations, model calls, handoff events, and token usage.

Once Promptfoo is collecting those traces, you can assert on the agent's path instead of only its final message:

tests:
- vars:
query: 'Find order 123 and tell me whether it shipped'
assert:
- type: trajectory:tool-used
value: search_orders

- type: trajectory:tool-sequence
value:
steps:
- search_orders
- compose_reply

- type: trajectory:goal-success
value: 'Determine whether order 123 shipped and tell the user the correct status'
provider: openai:gpt-5-mini

See Tracing for the eval-level OTLP setup required when you want Promptfoo to ingest and evaluate these traces directly.

Example: D&D Dungeon Master

Full working example with D&D mechanics, dice rolling, and character management:

description: D&D Adventure with AI Dungeon Master

prompts:
- '{{query}}'

providers:
- id: openai:agents:dungeon-master
config:
agent: file://./agents/dungeon-master-agent.ts
tools: file://./tools/game-tools.ts
maxTurns: 20
tracing: true

tests:
- description: Dragon combat with attack roll
vars:
query: 'I draw my longsword and attack the red dragon!'
assert:
- type: llm-rubric
value: Response includes dice rolls for attack and damage

- description: Check character stats
vars:
query: 'What are my character stats and current HP?'
assert:
- type: contains-any
value: ['Thorin', 'Fighter', 'level 5']
tip

Try the interactive example: npx promptfoo@latest init --example openai-agents-basic

Environment Variables

VariableDescription
OPENAI_API_KEYOpenAI API key (required)
PROMPTFOO_TRACING_ENABLEDEnable tracing globally
OPENAI_BASE_URLCustom OpenAI API base URL
OPENAI_ORGANIZATIONOpenAI organization ID

Limitations

warning

Tools must be async functions. Synchronous tools will cause runtime errors.

  • Agent definition files must be TypeScript or JavaScript
  • File paths require file:// prefix (relative paths resolve from config file location)
  • Default maximum: 10 turns (configure with maxTurns)