Cerebras

This provider enables you to use Cerebras models through their Inference API.

Cerebras offers an OpenAI-compatible API for various large language models including Llama models, DeepSeek, and more. You can use it as a drop-in replacement for applications currently using the OpenAI API chat endpoints.

Setup

Generate an API key from the Cerebras platform. Then set the CEREBRAS_API_KEY environment variable or pass it via the apiKey configuration field.

export CEREBRAS_API_KEY=your_api_key_here

Or in your config:

providers:
  - id: cerebras:llama3.1-8b
    config:
      apiKey: your_api_key_here

Provider Format

The Cerebras provider uses a simple format:

cerebras:<model name> - Using the chat completion interface for all models

Available Models

The Cerebras Inference API officially supports these models:

llama-4-scout-17b-16e-instruct - Llama 4 Scout 17B model with 16 expert MoE
llama3.1-8b - Llama 3.1 8B model
llama-3.3-70b - Llama 3.3 70B model
deepSeek-r1-distill-llama-70B (private preview)

To get the current list of available models, use the /models endpoint:

curl https://api.cerebras.ai/v1/models -H "Authorization: Bearer your_api_key_here"

Parameters

The provider accepts standard OpenAI chat parameters:

temperature - Controls randomness (0.0 to 1.5)
max_completion_tokens - Maximum number of tokens to generate
top_p - Nucleus sampling parameter
stop - Sequences where the API will stop generating further tokens
seed - Seed for deterministic generation
response_format - Controls the format of the model response (e.g., for JSON output)
logprobs - Whether to return log probabilities of the output tokens

Advanced Capabilities

Structured Outputs

Cerebras models support structured outputs with JSON schema enforcement to ensure your AI-generated responses follow a consistent, predictable format. This makes it easier to build reliable applications that can process AI outputs programmatically.

To use structured outputs, set the response_format parameter to include a JSON schema:

providers:
  - id: cerebras:llama-4-scout-17b-16e-instruct
    config:
      response_format:
        type: 'json_schema'
        json_schema:
          name: 'movie_schema'
          strict: true
          schema:
            type: 'object'
            properties:
              title: { 'type': 'string' }
              director: { 'type': 'string' }
              year: { 'type': 'integer' }
            required: ['title', 'director', 'year']
            additionalProperties: false

Alternatively, you can use simple JSON mode by setting response_format to {"type": "json_object"}.

Tool Use

Cerebras models support tool use (function calling), enabling LLMs to programmatically execute specific tasks. To use this feature, define the tools the model can use:

providers:
  - id: cerebras:llama-4-scout-17b-16e-instruct
    config:
      tools:
        - type: 'function'
          function:
            name: 'calculate'
            description: 'A calculator that can perform basic arithmetic operations'
            parameters:
              type: 'object'
              properties:
                expression:
                  type: 'string'
                  description: 'The mathematical expression to evaluate'
              required: ['expression']
            strict: true

When using tool calling, you'll need to process the model's response and handle any tool calls it makes, then provide the results back to the model for the final response.

Example Configuration

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Cerebras model evaluation
prompts:
  - You are an expert in {{topic}}. Explain {{question}} in simple terms.
providers:
  - id: cerebras:llama3.1-8b
    config:
      temperature: 0.7
      max_completion_tokens: 1024
  - id: cerebras:llama-3.3-70b
    config:
      temperature: 0.7
      max_completion_tokens: 1024
tests:
  - vars:
      topic: quantum computing
      question: Explain quantum entanglement in simple terms
    assert:
      - type: contains-any
        value: ['entangled', 'correlated', 'quantum state']
  - vars:
      topic: machine learning
      question: What is the difference between supervised and unsupervised learning?
    assert:
      - type: contains
        value: 'labeled data'

Setup​

Provider Format​

Available Models​

Parameters​

Advanced Capabilities​

Structured Outputs​

Tool Use​

Example Configuration​

See Also​