OpenAI Codex SDK
This provider makes OpenAI's Codex SDK available for evals. The Codex SDK supports code generation and manipulation with thread-based conversations and JSON schema output.
The OpenAI Codex SDK is a proprietary package and is not installed by default. You must install it separately.
Provider IDs
You can reference this provider using either:
openai:codex-sdk(full name)openai:codex(alias)
Installation
The OpenAI Codex SDK provider requires the @openai/codex-sdk package to be installed separately:
npm install @openai/codex-sdk
This is an optional dependency and only needs to be installed if you want to use the OpenAI Codex SDK provider. Note that the codex-sdk library may have a proprietary license.
Setup
Set your OpenAI API key with the OPENAI_API_KEY environment variable or specify the apiKey in the provider configuration.
Create OpenAI API keys here.
Example of setting the environment variable:
export OPENAI_API_KEY=your_api_key_here
Alternatively, you can use the CODEX_API_KEY environment variable:
export CODEX_API_KEY=your_api_key_here
Quick Start
Basic Usage
By default, the Codex SDK runs in the current working directory and requires a Git repository for safety. This prevents errors from code modifications.
providers:
- openai:codex-sdk
prompts:
- 'Write a Python function that calculates the factorial of a number'
The provider creates an ephemeral thread for each eval test case.
With Custom Model
Specify which OpenAI model to use for code generation:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
prompts:
- 'Write a TypeScript function that validates email addresses'
With Working Directory
Specify a custom working directory for the Codex SDK to operate in:
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
model: gpt-5.1-codex
prompts:
- 'Review the codebase and suggest improvements'
This allows you to prepare a directory with files before running your tests.
Skipping Git Check
If you need to run in a non-Git directory, you can bypass the Git repository requirement:
providers:
- id: openai:codex-sdk
config:
working_dir: ./temp-workspace
skip_git_repo_check: true
model: gpt-5.1-codex
prompts:
- 'Generate a README file for this project'
Skipping the Git check removes a safety guard. Use with caution and consider version control for any important code.
Supported Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
apiKey | string | OpenAI API key | Environment variable |
working_dir | string | Directory for Codex to operate in | Current directory |
additional_directories | string[] | Additional directories the agent can access | None |
model | string | Primary model to use | Codex SDK default |
fallback_model | string | Fallback model if primary fails | None |
max_tokens | number | Maximum tokens for response | Codex SDK default |
tool_output_token_limit | number | Maximum tokens for tool output | 10000 |
skip_git_repo_check | boolean | Skip Git repository validation | false |
codex_path_override | string | Custom path to codex binary | None |
thread_id | string | Resume existing thread from ~/.codex/sessions | None (creates new) |
persist_threads | boolean | Keep threads alive between calls | false |
thread_pool_size | number | Max concurrent threads (when persist_threads) | 1 |
output_schema | object | JSON schema for structured responses | None |
cli_env | object | Custom environment variables for Codex CLI | Inherits from process |
system_prompt | string | Custom system instructions | None |
enable_streaming | boolean | Enable streaming events | false |
Models
The Codex SDK supports OpenAI models. Use gpt-5.1-codex for code generation tasks:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
fallback_model: gpt-5
Supported models include:
GPT-5.1 Codex Models (Recommended)
gpt-5.1-codex- Primary model for code generation (recommended)gpt-5.1-codex-max- Frontier model with enhanced reasoning for complex tasksgpt-5.1-codex-mini- Cost-efficient variant for simpler tasks
GPT-5 Models
gpt-5-codex- Previous generation codex modelgpt-5-codex-mini- Cost-efficient previous generationgpt-5- GPT-5 base model
GPT-4 Models
gpt-4o- GPT-4 Omnigpt-4o-mini- GPT-4 variantgpt-4-turbo- GPT-4 Turbogpt-4- GPT-4 base
Reasoning Models
o3-mini- Mini reasoning modelo1- Reasoning modelo1-mini- Mini reasoning model
Mini Models
For faster or lower-cost evals, use mini models:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex-mini
Or reasoning-optimized models:
providers:
- id: openai:codex-sdk
config:
model: o3-mini
Thread Management
The Codex SDK uses thread-based conversations stored in ~/.codex/sessions. Promptfoo supports three thread management modes:
Ephemeral Threads (Default)
Creates a new thread for each eval, then discards it:
providers:
- openai:codex-sdk
Persistent Threads
Reuse threads between evals with the same configuration:
providers:
- id: openai:codex-sdk
config:
persist_threads: true
thread_pool_size: 2 # Allow up to 2 concurrent threads
Threads are pooled by cache key (working dir + model + output schema + prompt). When the pool is full, the oldest thread is evicted.
Thread Resumption
Resume a specific thread by ID:
providers:
- id: openai:codex-sdk
config:
thread_id: abc123def456 # Thread ID from ~/.codex/sessions
persist_threads: true # Cache the resumed thread
Structured Output
The Codex SDK supports JSON schema output. Specify an output_schema to get structured responses:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
output_schema:
type: object
properties:
function_name:
type: string
parameters:
type: array
items:
type: string
return_type:
type: string
required:
- function_name
- parameters
- return_type
prompts:
- 'Describe the signature of a function that calculates fibonacci numbers'
tests:
- assert:
- type: is-json
- type: javascript
value: 'output.function_name.includes("fibonacci")'
The output will be valid JSON matching your schema.
Zod Schemas
You can also use Zod schemas converted with zod-to-json-schema:
providers:
- id: openai:codex-sdk
config:
output_schema: file://schemas/function-signature.json
Streaming
Enable streaming to receive progress events:
providers:
- id: openai:codex-sdk
config:
enable_streaming: true
model: gpt-5.1-codex
When streaming is enabled, the provider processes events like item.completed and turn.completed to build the final response.
Git Repository Requirement
By default, the Codex SDK requires a Git repository in the working directory. This prevents errors from code modifications.
The provider validates:
- Working directory exists and is accessible
- Working directory is a directory (not a file)
.gitdirectory exists in the working directory
If validation fails, you'll see an error message.
To bypass this safety check:
providers:
- id: openai:codex-sdk
config:
skip_git_repo_check: true
Additional Directories
Allow the Codex agent to access directories beyond the main working directory:
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
additional_directories:
- ./tests
- ./config
- ./shared-libs
model: gpt-5.1-codex
This is useful when the agent needs to read files from multiple locations, such as test files, configuration, or shared libraries.
Tool Output Token Limit
Control how much output from tool calls is included in the context:
providers:
- id: openai:codex-sdk
config:
tool_output_token_limit: 20000 # Increase from default 10000
model: gpt-5.1-codex
Higher limits allow more tool output but consume more context tokens.
Custom Environment Variables
Pass custom environment variables to the Codex CLI:
providers:
- id: openai:codex-sdk
config:
cli_env:
CUSTOM_VAR: custom-value
ANOTHER_VAR: another-value
By default, the provider inherits all environment variables from the Node.js process.
Custom Binary Path
Override the default codex binary location:
providers:
- id: openai:codex-sdk
config:
codex_path_override: /custom/path/to/codex
Caching Behavior
This provider automatically caches responses based on:
- Prompt content
- Working directory (if specified)
- Additional directories (if specified)
- Model name
- Output schema (if specified)
- Tool output token limit (if specified)
To disable caching globally:
export PROMPTFOO_CACHE_ENABLED=false
To bust the cache for a specific test case, set options.bustCache: true in your test configuration:
tests:
- vars: {}
options:
bustCache: true
Advanced Examples
Multi-File Code Review
Review multiple files in a codebase:
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
model: gpt-5.1-codex
max_tokens: 4000
prompts:
- 'Review all TypeScript files in this directory and identify:
1. Potential security vulnerabilities
2. Performance issues
3. Code style violations
Return findings in JSON format'
tests:
- assert:
- type: is-json
- type: javascript
value: 'Array.isArray(output.findings)'
Structured Bug Report Generation
Generate structured bug reports from code:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
working_dir: ./test-code
output_schema:
type: object
properties:
bugs:
type: array
items:
type: object
properties:
severity:
type: string
enum: [critical, high, medium, low]
file:
type: string
line:
type: number
description:
type: string
fix_suggestion:
type: string
required:
- severity
- file
- description
required:
- bugs
prompts:
- 'Analyze the code and identify all bugs'
Thread-Based Conversations
Use persistent threads for multi-turn conversations:
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
persist_threads: true
thread_pool_size: 1
tests:
- vars:
request: 'Create a User class'
- vars:
request: 'Add a method to validate email'
- vars:
request: 'Add proper type hints'
prompts:
- '{{request}}'
Each test reuses the same thread, maintaining context.
Comparison with Claude Agent SDK
Both providers support code operations, but have different features:
OpenAI Codex SDK
- Best for: Code generation, structured output, reasoning tasks
- Features: JSON schema support, thread persistence, gpt-5.1-codex model
- Thread management: Built-in pooling and resumption
- Working directory: Git repository validation
- Configuration: Focused on code tasks
Claude Agent SDK
- Best for: File manipulation, system commands, MCP integration
- Features: Tool permissions, MCP servers, CLAUDE.md support
- Thread management: Temporary directory isolation
- Working directory: No Git requirement
- Configuration: More options for tool permissions and system access
Choose based on your use case:
- Code generation & analysis → OpenAI Codex SDK
- System operations & tooling → Claude Agent SDK
Examples
See the examples directory for complete implementations:
- Basic usage - Simple code generation
- Agentic SDK comparison - Side-by-side comparison with Claude Agent SDK
See Also
- OpenAI Platform Documentation
- Standard OpenAI provider - For text-only interactions
- Claude Agent SDK provider - Alternative agentic provider