Skip to main content

OpenAI Codex SDK

This provider makes OpenAI's Codex SDK available for evals. The Codex SDK supports code generation and manipulation with thread-based conversations and JSON schema output.

note

The OpenAI Codex SDK is a proprietary package and is not installed by default. You must install it separately.

Provider IDs

You can reference this provider using either:

  • openai:codex-sdk (full name)
  • openai:codex (alias)

Installation

The OpenAI Codex SDK provider requires the @openai/codex-sdk package to be installed separately:

npm install @openai/codex-sdk
note

This is an optional dependency and only needs to be installed if you want to use the OpenAI Codex SDK provider. Note that the codex-sdk library may have a proprietary license.

Setup

Set your OpenAI API key with the OPENAI_API_KEY environment variable or specify the apiKey in the provider configuration.

Create OpenAI API keys here.

Example of setting the environment variable:

export OPENAI_API_KEY=your_api_key_here

Alternatively, you can use the CODEX_API_KEY environment variable:

export CODEX_API_KEY=your_api_key_here

Quick Start

Basic Usage

By default, the Codex SDK runs in the current working directory and requires a Git repository for safety. This prevents errors from code modifications.

promptfooconfig.yaml
providers:
- openai:codex-sdk

prompts:
- 'Write a Python function that calculates the factorial of a number'

The provider creates an ephemeral thread for each eval test case.

With Custom Model

Specify which OpenAI model to use for code generation:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
model: codex # Or any supported model

prompts:
- 'Write a TypeScript function that validates email addresses'

With Working Directory

Specify a custom working directory for the Codex SDK to operate in:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./src

prompts:
- 'Review the codebase and suggest improvements'

This allows you to prepare a directory with files before running your tests.

Skipping Git Check

If you need to run in a non-Git directory, you can bypass the Git repository requirement:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./temp-workspace
skip_git_repo_check: true

prompts:
- 'Generate a README file for this project'
warning

Skipping the Git check removes a safety guard. Use with caution and consider version control for any important code.

Supported Parameters

ParameterTypeDescriptionDefault
apiKeystringOpenAI API keyEnvironment variable
base_urlstringCustom base URL for API requests (for proxies)None
working_dirstringDirectory for Codex to operate inCurrent directory
additional_directoriesstring[]Additional directories the agent can accessNone
modelstringModel to useCodex Max (default)
sandbox_modestringSandbox access level (see below)workspace-write
model_reasoning_effortstringReasoning intensity: low, medium, highSDK default
network_access_enabledbooleanAllow network requestsfalse
web_search_enabledbooleanAllow web searchfalse
approval_policystringWhen to require approval (see below)SDK default
skip_git_repo_checkbooleanSkip Git repository validationfalse
codex_path_overridestringCustom path to codex binaryNone
thread_idstringResume existing thread from ~/.codex/sessionsNone (creates new)
persist_threadsbooleanKeep threads alive between callsfalse
thread_pool_sizenumberMax concurrent threads (when persist_threads)1
output_schemaobjectJSON schema for structured responsesNone
cli_envobjectCustom environment variables for Codex CLIInherits from process
enable_streamingbooleanEnable streaming eventsfalse

Sandbox Modes

The sandbox_mode parameter controls filesystem access:

  • read-only - Agent can only read files (safest)
  • workspace-write - Agent can write to working directory (default)
  • danger-full-access - Agent has full filesystem access (use with caution)

Approval Policies

The approval_policy parameter controls when user approval is required:

  • never - Never require approval
  • on-request - Require approval when requested
  • on-failure - Require approval after failures
  • untrusted - Require approval for untrusted operations

Models

The SDK supports various OpenAI models. Use gpt-5.1-codex for code generation tasks:

providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex # Recommended for code tasks

Supported models include:

  • GPT-5.2 - Latest frontier model with improved knowledge and reasoning
  • GPT-5.1 Codex - Optimized for code generation (gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini)
  • GPT-5 Codex - Previous generation (gpt-5-codex, gpt-5-codex-mini)
  • GPT-4 models - General-purpose (gpt-4o, gpt-4o-mini, gpt-4-turbo)
  • Reasoning models - Enhanced reasoning (o1, o1-mini, o3-mini)

Mini Models

For faster or lower-cost evals, use mini model variants:

providers:
- id: openai:codex-sdk
config:
model: gpt-5.1-codex-mini

Thread Management

The Codex SDK uses thread-based conversations stored in ~/.codex/sessions. Promptfoo supports three thread management modes:

Ephemeral Threads (Default)

Creates a new thread for each eval, then discards it:

providers:
- openai:codex-sdk

Persistent Threads

Reuse threads between evals with the same configuration:

providers:
- id: openai:codex-sdk
config:
persist_threads: true
thread_pool_size: 2 # Allow up to 2 concurrent threads

Threads are pooled by cache key (working dir + model + output schema + prompt). When the pool is full, the oldest thread is evicted.

Thread Resumption

Resume a specific thread by ID:

providers:
- id: openai:codex-sdk
config:
thread_id: abc123def456 # Thread ID from ~/.codex/sessions
persist_threads: true # Cache the resumed thread

Structured Output

The Codex SDK supports JSON schema output. Specify an output_schema to get structured responses:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
output_schema:
type: object
properties:
function_name:
type: string
parameters:
type: array
items:
type: string
return_type:
type: string
required:
- function_name
- parameters
- return_type

prompts:
- 'Describe the signature of a function that calculates fibonacci numbers'

tests:
- assert:
- type: is-json
- type: javascript
value: 'output.function_name.includes("fibonacci")'

The output will be valid JSON matching your schema.

Zod Schemas

You can also use Zod schemas converted with zod-to-json-schema:

providers:
- id: openai:codex-sdk
config:
output_schema: file://schemas/function-signature.json

Streaming

Enable streaming to receive progress events:

providers:
- id: openai:codex-sdk
config:
enable_streaming: true

When streaming is enabled, the provider processes events like item.completed and turn.completed to build the final response.

Git Repository Requirement

By default, the Codex SDK requires a Git repository in the working directory. This prevents errors from code modifications.

The provider validates:

  1. Working directory exists and is accessible
  2. Working directory is a directory (not a file)
  3. .git directory exists in the working directory

If validation fails, you'll see an error message.

To bypass this safety check:

providers:
- id: openai:codex-sdk
config:
skip_git_repo_check: true

Sandbox Mode

Control the level of filesystem access for the agent:

providers:
- id: openai:codex-sdk
config:
sandbox_mode: read-only # Safest - agent can only read files

Available modes:

  • read-only - Agent can only read files, no modifications allowed
  • workspace-write - Agent can write to the working directory (default)
  • danger-full-access - Full filesystem access (use with extreme caution)

Web Search and Network Access

Enable the agent to search the web or make network requests:

providers:
- id: openai:codex-sdk
config:
web_search_enabled: true # Allow web searches
network_access_enabled: true # Allow network requests
warning

Enabling network access allows the agent to make arbitrary HTTP requests. Use with caution and only in trusted environments.

Model Reasoning Effort

Control how much reasoning the model uses:

providers:
- id: openai:codex-sdk
config:
model_reasoning_effort: high # Thorough reasoning for complex tasks

Available levels:

  • low - Light reasoning, faster responses
  • medium - Balanced (default)
  • high - Thorough reasoning for complex tasks

Additional Directories

Allow the Codex agent to access directories beyond the main working directory:

providers:
- id: openai:codex-sdk
config:
working_dir: ./src
additional_directories:
- ./tests
- ./config
- ./shared-libs

This is useful when the agent needs to read files from multiple locations, such as test files, configuration, or shared libraries.

Custom Environment Variables

Pass custom environment variables to the Codex CLI:

providers:
- id: openai:codex-sdk
config:
cli_env:
CUSTOM_VAR: custom-value
ANOTHER_VAR: another-value

By default, the provider inherits all environment variables from the Node.js process.

Custom Binary Path

Override the default codex binary location:

providers:
- id: openai:codex-sdk
config:
codex_path_override: /custom/path/to/codex

Caching Behavior

This provider automatically caches responses based on:

  • Prompt content
  • Working directory (if specified)
  • Additional directories (if specified)
  • Model name
  • Output schema (if specified)
  • Sandbox mode (if specified)
  • Model reasoning effort (if specified)
  • Network/web search settings (if specified)
  • Approval policy (if specified)

To disable caching globally:

export PROMPTFOO_CACHE_ENABLED=false

To bust the cache for a specific test case, set options.bustCache: true in your test configuration:

tests:
- vars: {}
options:
bustCache: true

Advanced Examples

Multi-File Code Review

Review multiple files in a codebase with enhanced reasoning:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./src
model_reasoning_effort: high # Use thorough reasoning for code review

prompts:
- 'Review all TypeScript files in this directory and identify:
1. Potential security vulnerabilities
2. Performance issues
3. Code style violations
Return findings in JSON format'

tests:
- assert:
- type: is-json
- type: javascript
value: 'Array.isArray(output.findings)'

Structured Bug Report Generation

Generate structured bug reports from code:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
working_dir: ./test-code
output_schema:
type: object
properties:
bugs:
type: array
items:
type: object
properties:
severity:
type: string
enum: [critical, high, medium, low]
file:
type: string
line:
type: number
description:
type: string
fix_suggestion:
type: string
required:
- severity
- file
- description
required:
- bugs

prompts:
- 'Analyze the code and identify all bugs'

Thread-Based Conversations

Use persistent threads for multi-turn conversations:

promptfooconfig.yaml
providers:
- id: openai:codex-sdk
config:
persist_threads: true
thread_pool_size: 1

tests:
- vars:
request: 'Create a User class'
- vars:
request: 'Add a method to validate email'
- vars:
request: 'Add proper type hints'

prompts:
- '{{request}}'

Each test reuses the same thread, maintaining context.

Comparison with Claude Agent SDK

Both providers support code operations, but have different features:

OpenAI Codex SDK

  • Best for: Code generation, structured output, reasoning tasks
  • Features: JSON schema support, thread persistence, Codex models
  • Thread management: Built-in pooling and resumption
  • Working directory: Git repository validation
  • Configuration: Focused on code tasks

Claude Agent SDK

  • Best for: File manipulation, system commands, MCP integration
  • Features: Tool permissions, MCP servers, CLAUDE.md support
  • Thread management: Temporary directory isolation
  • Working directory: No Git requirement
  • Configuration: More options for tool permissions and system access

Choose based on your use case:

  • Code generation & analysis → OpenAI Codex SDK
  • System operations & tooling → Claude Agent SDK

Examples

See the examples directory for complete implementations:

See Also