Skip to main content

OpenAI Codex SDK

This provider makes OpenAI's Codex SDK available for evals. The Codex SDK supports code generation and manipulation with thread-based conversations and JSON schema output.

note

The OpenAI Codex SDK is a proprietary package and is not installed by default. You must install it separately.

Provider IDs

You can reference this provider using either:

  • openai:codex-sdk (full name)
  • openai:codex (alias)

Installation

The OpenAI Codex SDK provider requires the @openai/codex-sdk package to be installed separately:

npm install @openai/codex-sdk
note

This is an optional dependency and only needs to be installed if you want to use the OpenAI Codex SDK provider. Note that the codex-sdk library may have a proprietary license.

Setup

Set your OpenAI API key with the OPENAI_API_KEY environment variable or specify the apiKey in the provider configuration.

Create OpenAI API keys here.

Example of setting the environment variable:

export OPENAI_API_KEY=your_api_key_here

Alternatively, you can use the CODEX_API_KEY environment variable:

export CODEX_API_KEY=your_api_key_here

Quick Start

Basic Usage

By default, the Codex SDK runs in the current working directory and requires a Git repository for safety. This prevents errors from code modifications.

promptfooconfig.yaml
providers:
- openai:codex-sdk

prompts:
- 'Write a Python function that calculates the factorial of a number'

The provider creates an ephemeral thread for each eval test case.

With Custom Model

Specify which OpenAI model to use for code generation:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex

prompts:
- 'Write a TypeScript function that validates email addresses'

With Working Directory

Specify a custom working directory for the Codex SDK to operate in:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
model: gpt-5.1-codex

prompts:
- 'Review the codebase and suggest improvements'

This allows you to prepare a directory with files before running your tests.

Skipping Git Check

If you need to run in a non-Git directory, you can bypass the Git repository requirement:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./temp-workspace
skip_git_repo_check: true
model: gpt-5.1-codex

prompts:
- 'Generate a README file for this project'
warning

Skipping the Git check removes a safety guard. Use with caution and consider version control for any important code.

Supported Parameters

ParameterTypeDescriptionDefault
apiKeystringOpenAI API keyEnvironment variable
working_dirstringDirectory for Codex to operate inCurrent directory
additional_directoriesstring[]Additional directories the agent can accessNone
modelstringPrimary model to useCodex SDK default
fallback_modelstringFallback model if primary failsNone
max_tokensnumberMaximum tokens for responseCodex SDK default
tool_output_token_limitnumberMaximum tokens for tool output10000
skip_git_repo_checkbooleanSkip Git repository validationfalse
codex_path_overridestringCustom path to codex binaryNone
thread_idstringResume existing thread from ~/.codex/sessionsNone (creates new)
persist_threadsbooleanKeep threads alive between callsfalse
thread_pool_sizenumberMax concurrent threads (when persist_threads)1
output_schemaobjectJSON schema for structured responsesNone
cli_envobjectCustom environment variables for Codex CLIInherits from process
system_promptstringCustom system instructionsNone
enable_streamingbooleanEnable streaming eventsfalse

Models

The Codex SDK supports OpenAI models. Use gpt-5.1-codex for code generation tasks:

providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
fallback_model: gpt-5

Supported models include:

GPT-5.1 Codex Models (Recommended)

  • gpt-5.1-codex - Primary model for code generation (recommended)
  • gpt-5.1-codex-max - Frontier model with enhanced reasoning for complex tasks
  • gpt-5.1-codex-mini - Cost-efficient variant for simpler tasks

GPT-5 Models

  • gpt-5-codex - Previous generation codex model
  • gpt-5-codex-mini - Cost-efficient previous generation
  • gpt-5 - GPT-5 base model

GPT-4 Models

  • gpt-4o - GPT-4 Omni
  • gpt-4o-mini - GPT-4 variant
  • gpt-4-turbo - GPT-4 Turbo
  • gpt-4 - GPT-4 base

Reasoning Models

  • o3-mini - Mini reasoning model
  • o1 - Reasoning model
  • o1-mini - Mini reasoning model

Mini Models

For faster or lower-cost evals, use mini models:

providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex-mini

Or reasoning-optimized models:

providers:
- id: openai:codex-sdk
config:
model: o3-mini

Thread Management

The Codex SDK uses thread-based conversations stored in ~/.codex/sessions. Promptfoo supports three thread management modes:

Ephemeral Threads (Default)

Creates a new thread for each eval, then discards it:

providers:
- openai:codex-sdk

Persistent Threads

Reuse threads between evals with the same configuration:

providers:
- id: openai:codex-sdk
config:
persist_threads: true
thread_pool_size: 2 # Allow up to 2 concurrent threads

Threads are pooled by cache key (working dir + model + output schema + prompt). When the pool is full, the oldest thread is evicted.

Thread Resumption

Resume a specific thread by ID:

providers:
- id: openai:codex-sdk
config:
thread_id: abc123def456 # Thread ID from ~/.codex/sessions
persist_threads: true # Cache the resumed thread

Structured Output

The Codex SDK supports JSON schema output. Specify an output_schema to get structured responses:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
output_schema:
type: object
properties:
function_name:
type: string
parameters:
type: array
items:
type: string
return_type:
type: string
required:
- function_name
- parameters
- return_type

prompts:
- 'Describe the signature of a function that calculates fibonacci numbers'

tests:
- assert:
- type: is-json
- type: javascript
value: 'output.function_name.includes("fibonacci")'

The output will be valid JSON matching your schema.

Zod Schemas

You can also use Zod schemas converted with zod-to-json-schema:

providers:
- id: openai:codex-sdk
config:
output_schema: file://schemas/function-signature.json

Streaming

Enable streaming to receive progress events:

providers:
- id: openai:codex-sdk
config:
enable_streaming: true
model: gpt-5.1-codex

When streaming is enabled, the provider processes events like item.completed and turn.completed to build the final response.

Git Repository Requirement

By default, the Codex SDK requires a Git repository in the working directory. This prevents errors from code modifications.

The provider validates:

  1. Working directory exists and is accessible
  2. Working directory is a directory (not a file)
  3. .git directory exists in the working directory

If validation fails, you'll see an error message.

To bypass this safety check:

providers:
- id: openai:codex-sdk
config:
skip_git_repo_check: true

Additional Directories

Allow the Codex agent to access directories beyond the main working directory:

providers:
- id: openai:codex-sdk
config:
working_dir: ./src
additional_directories:
- ./tests
- ./config
- ./shared-libs
model: gpt-5.1-codex

This is useful when the agent needs to read files from multiple locations, such as test files, configuration, or shared libraries.

Tool Output Token Limit

Control how much output from tool calls is included in the context:

providers:
- id: openai:codex-sdk
config:
tool_output_token_limit: 20000 # Increase from default 10000
model: gpt-5.1-codex

Higher limits allow more tool output but consume more context tokens.

Custom Environment Variables

Pass custom environment variables to the Codex CLI:

providers:
- id: openai:codex-sdk
config:
cli_env:
CUSTOM_VAR: custom-value
ANOTHER_VAR: another-value

By default, the provider inherits all environment variables from the Node.js process.

Custom Binary Path

Override the default codex binary location:

providers:
- id: openai:codex-sdk
config:
codex_path_override: /custom/path/to/codex

Caching Behavior

This provider automatically caches responses based on:

  • Prompt content
  • Working directory (if specified)
  • Additional directories (if specified)
  • Model name
  • Output schema (if specified)
  • Tool output token limit (if specified)

To disable caching globally:

export PROMPTFOO_CACHE_ENABLED=false

To bust the cache for a specific test case, set options.bustCache: true in your test configuration:

tests:
- vars: {}
options:
bustCache: true

Advanced Examples

Multi-File Code Review

Review multiple files in a codebase:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
model: gpt-5.1-codex
max_tokens: 4000

prompts:
- 'Review all TypeScript files in this directory and identify:
1. Potential security vulnerabilities
2. Performance issues
3. Code style violations
Return findings in JSON format'

tests:
- assert:
- type: is-json
- type: javascript
value: 'Array.isArray(output.findings)'

Structured Bug Report Generation

Generate structured bug reports from code:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
working_dir: ./test-code
output_schema:
type: object
properties:
bugs:
type: array
items:
type: object
properties:
severity:
type: string
enum: [critical, high, medium, low]
file:
type: string
line:
type: number
description:
type: string
fix_suggestion:
type: string
required:
- severity
- file
- description
required:
- bugs

prompts:
- 'Analyze the code and identify all bugs'

Thread-Based Conversations

Use persistent threads for multi-turn conversations:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex
persist_threads: true
thread_pool_size: 1

tests:
- vars:
request: 'Create a User class'
- vars:
request: 'Add a method to validate email'
- vars:
request: 'Add proper type hints'

prompts:
- '{{request}}'

Each test reuses the same thread, maintaining context.

Comparison with Claude Agent SDK

Both providers support code operations, but have different features:

OpenAI Codex SDK

  • Best for: Code generation, structured output, reasoning tasks
  • Features: JSON schema support, thread persistence, gpt-5.1-codex model
  • Thread management: Built-in pooling and resumption
  • Working directory: Git repository validation
  • Configuration: Focused on code tasks

Claude Agent SDK

  • Best for: File manipulation, system commands, MCP integration
  • Features: Tool permissions, MCP servers, CLAUDE.md support
  • Thread management: Temporary directory isolation
  • Working directory: No Git requirement
  • Configuration: More options for tool permissions and system access

Choose based on your use case:

  • Code generation & analysis → OpenAI Codex SDK
  • System operations & tooling → Claude Agent SDK

Examples

See the examples directory for complete implementations:

See Also