Skip to main content

Claude Agent SDK

This provider makes Claude Agent SDK available for evals through its TypeScript SDK.

info

The Claude Agent SDK was formerly known as the Claude Code SDK. It's still built on top of Claude Code and exposes all its functionality.

Provider IDs

You can reference this provider using either:

  • anthropic:claude-agent-sdk (full name)
  • anthropic:claude-code (alias)

Installation

The Claude Agent SDK provider requires the @anthropic-ai/claude-agent-sdk package to be installed separately:

npm install @anthropic-ai/claude-agent-sdk
note

This is an optional dependency and only needs to be installed if you want to use the Claude Agent SDK provider.

Setup

The easiest way to get started is with an Anthropic API key. You can set it with the ANTHROPIC_API_KEY environment variable or specify the apiKey in the provider configuration.

Create Anthropic API keys here.

Example of setting the environment variable:

export ANTHROPIC_API_KEY=your_api_key_here

Other Model Providers

Apart from using the Anthropic API, you can also use AWS Bedrock and Google Vertex AI.

For AWS Bedrock:

  • Set the CLAUDE_CODE_USE_BEDROCK environment variable to true:
export CLAUDE_CODE_USE_BEDROCK=true

For Google Vertex:

  • Set the CLAUDE_CODE_USE_VERTEX environment variable to true:
export CLAUDE_CODE_USE_VERTEX=true

Quick Start

Basic Usage

By default, Claude Agent SDK runs in a temporary directory with no tools enabled, using the default permission mode. This makes it behave similarly to the standard Anthropic provider. It has no access to the file system (read or write) and can't run system commands.

promptfooconfig.yaml
providers:
- anthropic:claude-agent-sdk

prompts:
- 'Output a python function that prints the first 10 numbers in the Fibonacci sequence'

When your test cases finish, the temporary directory is deleted.

With Working Directory

You can specify a specific working directory for Claude Agent SDK to run in:

providers:
- id: anthropic:claude-agent-sdk
config:
working_dir: ./src

prompts:
- 'Review the TypeScript files and identify potential bugs'

This allows you to prepare a directory with files or sub-directories before running your tests.

By default, when you specify a working directory, Claude Agent SDK is given read-only access to the directory.

With Side Effects

You can also allow Claude Agent SDK to write to files, run system commands, call MCP servers, and more.

Here's an example that will allow Claude Agent SDK to both read from and write to files in the working directory. It uses append_allowed_tools to add tools for writing and editing files to the default set of read-only tools. It also sets permission_mode to acceptEdits so Claude Agent SDK can modify files without asking for confirmation.

providers:
- id: anthropic:claude-agent-sdk
config:
working_dir: ./my-project
append_allowed_tools: ['Write', 'Edit', 'MultiEdit']
permission_mode: 'acceptEdits'

prompts:
- 'Refactor the authentication module to use async/await'

Note: when using acceptEdits and tools that allow side effects like writing to files, you'll need to consider how you will reset the files after each test run. See the Managing Side Effects section for more information.

Supported Parameters

ParameterTypeDescriptionDefault
apiKeystringAnthropic API keyEnvironment variable
working_dirstringDirectory for file operationsTemporary directory
modelstringPrimary model to use (passed to Claude Agent SDK)Claude Agent SDK default
fallback_modelstringFallback model if primary failsClaude Agent SDK default
max_turnsnumberMaximum conversation turnsClaude Agent SDK default
max_thinking_tokensnumberMaximum tokens for thinkingClaude Agent SDK default
permission_modestringFile access permissions: default, plan, acceptEdits, bypassPermissionsdefault
custom_system_promptstringReplace default system promptNone
append_system_promptstringAppend to default system promptNone
custom_allowed_toolsstring[]Replace default allowed toolsNone
append_allowed_toolsstring[]Add to default allowed toolsNone
allow_all_toolsbooleanAllow all available toolsfalse
disallowed_toolsstring[]Tools to explicitly block (overrides allowed)None
mcpobjectMCP server configurationNone
strict_mcp_configbooleanOnly allow configured MCP serverstrue
setting_sourcesstring[]Where SDK looks for settings, CLAUDE.md, and slash commandsNone (disabled)

Models

Model selection is optional, since Claude Agent SDK uses sensible defaults. When specified, models are passed directly to the Claude Agent SDK.

providers:
- id: anthropic:claude-agent-sdk
config:
model: claude-opus-4-1-20250805
fallback_model: claude-sonnet-4-20250514

Claude Agent SDK also supports a number of model aliases, which can also be used in the configuration.

providers:
- id: anthropic:claude-agent-sdk
config:
model: sonnet
fallback_model: haiku

Claude Agent SDK also supports configuring models through environment variables. When using this provider, any environment variables you set will be passed through to the Claude Agent SDK.

System Prompt

Unless you specify a custom_system_prompt, the default Claude Code system prompt will be used. You can append additional instructions to it with append_system_prompt.

info

Note that this differs slightly from the Claude Agent SDK's behavior when used independently of Promptfoo. The Agent SDK will not use the Claude Code system prompt by default unless it's specified—it will instead use an empty system prompt if none is provided. If you want to use an empty system prompt with this provider, set custom_system_prompt to an empty string.

Tools and Permissions

Default Tools

If no working_dir is specified, Claude Agent SDK runs in a temporary directory with no access to tools by default.

By default, when a working_dir is specified, Claude Agent SDK has access to the following read-only tools:

  • Read - Read file contents
  • Grep - Search file contents
  • Glob - Find files by pattern
  • LS - List directory contents

Permission Modes

Control Claude Agent SDK's permissions for modifying files and running system commands:

ModeDescription
defaultStandard permissions
planPlanning mode
acceptEditsAllow file modifications
bypassPermissionsNo restrictions

Tool Configuration

Customize available tools for your use case:

# Add tools to defaults
providers:
- id: anthropic:claude-agent-sdk
config:
append_allowed_tools: ['Write', 'Edit']

# Replace default tools entirely
providers:
- id: anthropic:claude-agent-sdk
config:
custom_allowed_tools: ['Read', 'Grep', 'Glob', 'Write', 'Edit', 'MultiEdit', 'Bash', 'WebFetch', 'WebSearch']

# Block specific tools
providers:
- id: anthropic:claude-agent-sdk
config:
disallowed_tools: ['Delete', 'Run']

# Allow all tools (use with caution)
providers:
- id: anthropic:claude-agent-sdk
config:
allow_all_tools: true

⚠️ Security Note: Some tools allow Claude Agent SDK to modify files, run system commands, search the web, and more. Think carefully about security implications before using these tools.

Here's a full list of available tools.

MCP Integration

Unlike the standard Anthropic provider, Claude Agent SDK handles MCP (Model Context Protocol) connections directly. Configuration is forwarded to the Claude Agent SDK:

providers:
- id: anthropic:claude-agent-sdk
config:
mcp:
servers:
# HTTP-based server
- url: https://api.example.com/mcp
name: api-server
headers:
Authorization: 'Bearer token'

# Process-based server
- command: node
args: ['mcp-server.js']
name: local-server

strict_mcp_config: true # Only use configured servers (true by default)

For detailed MCP configuration, see Claude Code MCP documentation.

Setting Sources

By default, the Claude Agent SDK provider does not look for settings files, CLAUDE.md, or slash commands. You can enable this by specifying setting_sources:

providers:
- id: anthropic:claude-agent-sdk
config:
setting_sources: ['project', 'local']

Available values:

  • user - User-level settings
  • project - Project-level settings
  • local - Local directory settings

Caching Behavior

This provider automatically caches responses, and will read from the cache if the prompt, configuration, and files in the working directory (if working_dir is set) are the same as a previous run.

To disable caching globally:

export PROMPTFOO_CACHE_ENABLED=false

You can also include bustCache: true in the configuration to prevent reading from the cache.

Managing Side Effects

When using Claude Agent SDK with configurations that allow side effects, like writing to files, running system commands, or calling MCP servers, you'll need to consider:

  • How to reset after each test run
  • How to ensure tests don't interfere with each other (like writing to the same files concurrently)

This increases complexity, so first consider if you can achieve your goal with a read-only configuration. If you do need to test with side effects, here are some strategies that can help:

  • Serial execution: Set evaluateOptions.maxConcurrency: 1 in your config or use --max-concurrency 1 CLI flag
  • Hooks: Use promptfoo extension hooks to reset the environment after each test run
  • Wrapper scripts: Handle setup/cleanup outside of promptfoo
  • Use git: If you're using a custom working directory, you can use git to reset the files after each test run
  • Use a container: Run tests that might run commands in a container to protect the host system

Examples

Here are a few complete example implementations:

See Also