Skip to main content

xAI (Grok)

The xai provider supports xAI's Grok models through an API interface compatible with OpenAI's format. The provider supports both text and vision capabilities depending on the model used.

Setup

To use xAI's API, set the XAI_API_KEY environment variable or specify via apiKey in the configuration file.

export XAI_API_KEY=your_api_key_here

Supported Models

The xAI provider includes support for the following model formats:

Grok 4.1 Fast Models

  • xai:grok-4-1-fast-reasoning - Frontier model optimized for agentic tool calling with reasoning (2M context)
  • xai:grok-4-1-fast-non-reasoning - Fast variant for instant responses without reasoning (2M context)
  • xai:grok-4-1-fast - Alias for grok-4-1-fast-reasoning
  • xai:grok-4-1-fast-latest - Alias for grok-4-1-fast-reasoning

Grok Code Fast Models

  • xai:grok-code-fast-1 - Speedy and economical reasoning model optimized for agentic coding (256K context)
  • xai:grok-code-fast - Alias for grok-code-fast-1
  • xai:grok-code-fast-1-0825 - Specific version of the code-fast model (256K context)

Grok-4 Fast Models

  • xai:grok-4-fast-reasoning - Fast reasoning model with 2M context window
  • xai:grok-4-fast-non-reasoning - Fast non-reasoning model for instant responses (2M context)
  • xai:grok-4-fast - Alias for grok-4-fast-reasoning
  • xai:grok-4-fast-latest - Alias for grok-4-fast-reasoning

Grok-4 Models

  • xai:grok-4-0709 - Flagship reasoning model (256K context)
  • xai:grok-4 - Alias for latest Grok-4 model
  • xai:grok-4-latest - Alias for latest Grok-4 model

Grok-3 Models

  • xai:grok-3-beta - Latest flagship model for enterprise tasks (131K context)
  • xai:grok-3-fast-beta - Fastest flagship model (131K context)
  • xai:grok-3-mini-beta - Smaller model for basic tasks, supports reasoning effort (32K context)
  • xai:grok-3-mini-fast-beta - Faster mini model, supports reasoning effort (32K context)
  • xai:grok-3 - Alias for grok-3-beta
  • xai:grok-3-latest - Alias for grok-3-beta
  • xai:grok-3-fast - Alias for grok-3-fast-beta
  • xai:grok-3-fast-latest - Alias for grok-3-fast-beta
  • xai:grok-3-mini - Alias for grok-3-mini-beta
  • xai:grok-3-mini-latest - Alias for grok-3-mini-beta
  • xai:grok-3-mini-fast - Alias for grok-3-mini-fast-beta
  • xai:grok-3-mini-fast-latest - Alias for grok-3-mini-fast-beta

Grok-2 and previous Models

  • xai:grok-2-latest - Latest Grok-2 model (131K context)
  • xai:grok-2-vision-latest - Latest Grok-2 vision model (32K context)
  • xai:grok-2-vision-1212
  • xai:grok-2-1212
  • xai:grok-beta - Beta version (131K context)
  • xai:grok-vision-beta - Vision beta version (8K context)

You can also use specific versioned models:

  • xai:grok-2-1212
  • xai:grok-2-vision-1212

Configuration

The provider supports all OpenAI provider configuration options plus Grok-specific options. Example usage:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-3-mini-beta
config:
temperature: 0.7
reasoning_effort: 'high' # Only for grok-3-mini models
apiKey: your_api_key_here # Alternative to XAI_API_KEY

Reasoning Support

Multiple Grok models support reasoning capabilities:

Grok Code Fast Models: The grok-code-fast-1 family are reasoning models optimized for agentic coding workflows. They support:

  • Function calling and tool usage
  • Web search via search_parameters
  • Fast inference with built-in reasoning

Grok-3 Models: The grok-3-mini-beta and grok-3-mini-fast-beta models support reasoning through the reasoning_effort parameter:

  • reasoning_effort: "low" - Minimal thinking time, using fewer tokens for quick responses
  • reasoning_effort: "high" - Maximum thinking time, leveraging more tokens for complex problems
info

For Grok-3, reasoning is only available for the mini variants. The standard grok-3-beta and grok-3-fast-beta models do not support reasoning.

Grok 4.1 Fast Specific Behavior

Grok 4.1 Fast is xAI's frontier model specifically optimized for agentic tool calling:

  • Two variants: grok-4-1-fast-reasoning for maximum intelligence, grok-4-1-fast-non-reasoning for instant responses
  • Massive context window: 2,000,000 tokens for handling complex multi-turn agent interactions
  • Optimized for tool calling: Trained specifically for high-performance agentic tool calling via RL in simulated environments
  • Low latency and cost: $0.20/1M input tokens, $0.50/1M output tokens with fast inference
  • Unsupported parameters: Same restrictions as Grok-4 (no presence_penalty, frequency_penalty, stop, reasoning_effort)
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4-1-fast-reasoning
config:
temperature: 0.7
max_completion_tokens: 4096

Grok-4 Fast Specific Behavior

Grok-4 Fast models offer the same capabilities as Grok-4 but with faster inference and lower cost:

  • Two variants: grok-4-fast-reasoning for reasoning tasks, grok-4-fast-non-reasoning for instant responses
  • 2M context window: Same large context as Grok 4.1 Fast
  • Same parameter restrictions as Grok-4: No presence_penalty, frequency_penalty, stop, or reasoning_effort
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4-fast-reasoning
config:
temperature: 0.7
max_completion_tokens: 4096

Grok-4 Specific Behavior

Grok-4 introduces significant changes compared to previous Grok models:

  • Always uses reasoning: Grok-4 is a reasoning model that always operates at maximum reasoning capacity
  • No reasoning_effort parameter: Unlike Grok-3 mini models, Grok-4 does not support the reasoning_effort parameter
  • Unsupported parameters: The following parameters are not supported and will be automatically filtered out:
    • presencePenalty / presence_penalty
    • frequencyPenalty / frequency_penalty
    • stop
  • Larger context window: 256,000 tokens compared to 131,072 for Grok-3 models
  • Uses max_completion_tokens: As a reasoning model, Grok-4 uses max_completion_tokens instead of max_tokens
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-4
config:
temperature: 0.7
max_completion_tokens: 4096

Grok Code Fast Specific Behavior

The Grok Code Fast models are optimized for agentic coding workflows and offer several key features:

  • Built for Speed: Designed to be highly responsive for agentic coding tools where multiple tool calls are common
  • Economical Pricing: At $0.20/1M input tokens and $1.50/1M output tokens, significantly more affordable than flagship models
  • Reasoning Capabilities: Built-in reasoning for code analysis, debugging, and problem-solving
  • Tool Integration: Excellent support for function calling, tool usage, and web search
  • Coding Expertise: Particularly adept at TypeScript, Python, Java, Rust, C++, and Go
promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: xai:grok-code-fast-1
# or use the alias:
# - id: xai:grok-code-fast
config:
temperature: 0.1 # Lower temperature often preferred for coding tasks
max_completion_tokens: 4096
search_parameters:
mode: auto # Enable web search for coding assistance

Region Support

You can specify a region to use a region-specific API endpoint:

providers:
- id: xai:grok-2-latest
config:
region: us-west-1 # Will use https://us-west-1.api.x.ai/v1

This is equivalent to setting base_url="https://us-west-1.api.x.ai/v1" in the Python client.

Live Search (Beta)

Deprecation Notice

xAI has announced that the Live Search API (via search_parameters) will be deprecated by December 15, 2025. The replacement is the Agent Tools API, which provides enhanced agentic search capabilities. Agent Tools require the Responses API endpoint - see the Agent Tools API section for more details.

You can optionally enable Grok's Live Search feature to let the model pull in real-time information from the web or X. Pass a search_parameters object in your provider config. The mode field controls how search is used:

  • off – Disable search
  • auto – Model decides when to search (default)
  • on – Always perform live search

Additional fields like sources, from_date, to_date, and return_citations may also be provided.

promptfooconfig.yaml
providers:
- id: xai:grok-3-beta
config:
search_parameters:
mode: auto
return_citations: true
sources:
- type: web

For a full list of options see the xAI documentation.

Agent Tools API (Responses API)

Use the xai:responses:<model> provider to access xAI's Agent Tools API, which enables autonomous server-side tool execution for web search, X search, and code interpretation.

promptfooconfig.yaml
providers:
- id: xai:responses:grok-4-1-fast-reasoning
config:
temperature: 0.7
max_output_tokens: 4096
tools:
- type: web_search
- type: x_search

Available Agent Tools

ToolDescription
web_searchSearch the web and browse pages
x_searchSearch X posts, users, and threads
code_interpreterExecute Python code in a sandbox
collections_searchSearch uploaded knowledge bases
mcpConnect to remote MCP servers

Web Search Tool

tools:
- type: web_search
filters:
allowed_domains:
- example.com
- news.com
# OR excluded_domains (cannot use both)
enable_image_understanding: true

X Search Tool

tools:
- type: x_search
from_date: '2025-01-01' # ISO8601 format
to_date: '2025-11-27'
allowed_x_handles:
- elonmusk
enable_image_understanding: true
enable_video_understanding: true

Code Interpreter Tool

tools:
- type: code_interpreter
container:
pip_packages:
- numpy
- pandas

Complete Example

promptfooconfig.yaml
providers:
- id: xai:responses:grok-4-fast
config:
temperature: 0.7
tools:
- type: web_search
enable_image_understanding: true
- type: x_search
from_date: '2025-01-01'
- type: code_interpreter
container:
pip_packages:
- numpy
tool_choice: auto # auto, required, or none
parallel_tool_calls: true

tests:
- vars:
question: What's the latest AI news? Search the web and X.
assert:
- type: contains
value: AI

Responses API Configuration

ParameterTypeDescription
temperaturenumberSampling temperature (0-2)
max_output_tokensnumberMaximum tokens to generate
top_pnumberNucleus sampling parameter
toolsarrayAgent tools to enable
tool_choicestringTool selection mode: auto, required, none
parallel_tool_callsbooleanAllow parallel tool execution
instructionsstringSystem-level instructions
previous_response_idstringFor multi-turn conversations
storebooleanStore response for later retrieval
response_formatobjectJSON schema for structured output

Supported Models

The Responses API works with Grok 4 models:

  • grok-4-1-fast-reasoning (recommended for agentic workflows)
  • grok-4-1-fast-non-reasoning
  • grok-4-fast-reasoning
  • grok-4-fast-non-reasoning
  • grok-4

If you're using Live Search via search_parameters, migrate to the Responses API before December 15, 2025:

Before (Live Search - deprecated):

providers:
- id: xai:grok-4-1-fast-reasoning
config:
search_parameters:
mode: auto
sources:
- type: web
- type: x

After (Responses API):

providers:
- id: xai:responses:grok-4-1-fast-reasoning
config:
tools:
- type: web_search
- type: x_search

Deferred Chat Completions

Not Yet Supported

xAI offers Deferred Chat Completions for long-running requests that can be retrieved asynchronously via a request_id. This feature is not yet supported in promptfoo. For async workflows, use the xAI Python SDK directly.

Function Calling

xAI supports standard OpenAI-compatible function calling for client-side tool execution:

promptfooconfig.yaml
providers:
- id: xai:grok-4-1-fast-reasoning
config:
tools:
- type: function
function:
name: get_weather
description: Get the current weather for a location
parameters:
type: object
properties:
location:
type: string
description: City and state
required:
- location

Structured Outputs

xAI supports structured outputs via JSON schema:

promptfooconfig.yaml
providers:
- id: xai:grok-4
config:
response_format:
type: json_schema
json_schema:
name: analysis_result
strict: true
schema:
type: object
properties:
summary:
type: string
confidence:
type: number
required:
- summary
- confidence
additionalProperties: false

Vision Support

For models with vision capabilities, you can include images in your prompts using the same format as OpenAI. Create a prompt.yaml file:

prompt.yaml
- role: user
content:
- type: image_url
image_url:
url: '{{image_url}}'
detail: 'high'
- type: text
text: '{{question}}'

Then reference it in your promptfoo config:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- file://prompt.yaml

providers:
- id: xai:grok-2-vision-latest

tests:
- vars:
image_url: 'https://example.com/image.jpg'
question: "What's in this image?"

Image Generation

xAI also supports image generation through the Grok image model:

providers:
- xai:image:grok-2-image

Example configuration for image generation:

promptfooconfig.yaml
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts:
- 'A {{style}} painting of {{subject}}'

providers:
- id: xai:image:grok-2-image
config:
n: 1 # Number of images to generate (1-10)
response_format: 'url' # 'url' or 'b64_json'

tests:
- vars:
style: 'impressionist'
subject: 'sunset over mountains'

For more information on the available models and API usage, refer to the xAI documentation.

Examples

For examples demonstrating text generation, image creation, and web search, see the xai example.

You can run this example with:

npx promptfoo@latest init --example xai

See Also

Troubleshooting

502 Bad Gateway Errors

If you encounter 502 Bad Gateway errors when using the xAI provider, this typically indicates:

  • An invalid or missing API key
  • Server issues on x.ai's side

The xAI provider will provide helpful error messages to guide you in resolving these issues.

Solution: Verify your XAI_API_KEY environment variable is set correctly. You can obtain an API key from https://x.ai/.

Controlling Retries

If you're experiencing timeouts or want to control retry behavior:

  • To disable retries for 5XX errors: PROMPTFOO_RETRY_5XX=false
  • To reduce retry delays: PROMPTFOO_REQUEST_BACKOFF_MS=1000 (in milliseconds)

Reference