Databricks Foundation Model APIs

The Databricks provider integrates with Databricks' Foundation Model APIs, offering access to state-of-the-art models through a unified OpenAI-compatible interface. It supports multiple deployment modes to match your specific use case and performance requirements.

Overview

Databricks Foundation Model APIs provide three main deployment options:

Pay-per-token endpoints: Pre-configured endpoints for popular models with usage-based pricing
Provisioned throughput: Dedicated endpoints with guaranteed performance for production workloads
External models: Unified access to models from providers like OpenAI, Anthropic, and Google through Databricks

Prerequisites

A Databricks workspace with Foundation Model APIs enabled
A Databricks access token for authentication
Your workspace URL (e.g., https://your-workspace.cloud.databricks.com)

Set up your environment:

export DATABRICKS_WORKSPACE_URL=https://your-workspace.cloud.databricks.com
export DATABRICKS_TOKEN=your-token-here

Basic Usage

Pay-per-token Endpoints

Access pre-configured Foundation Model endpoints with simple configuration:

promptfooconfig.yaml
providers:
  - id: databricks:databricks-meta-llama-3-3-70b-instruct
    config:
      isPayPerToken: true
      workspaceUrl: https://your-workspace.cloud.databricks.com

Available pay-per-token models include:

databricks-meta-llama-3-3-70b-instruct - Meta's latest Llama model
databricks-claude-3-7-sonnet - Anthropic Claude with reasoning capabilities
databricks-gte-large-en - Text embeddings model
databricks-dbrx-instruct - Databricks' own foundation model

Provisioned Throughput Endpoints

For production workloads requiring guaranteed performance:

providers:
  - id: databricks:my-custom-endpoint
    config:
      workspaceUrl: https://your-workspace.cloud.databricks.com
      temperature: 0.7
      max_tokens: 500

External Models

Access external models through Databricks' unified API:

providers:
  - id: databricks:my-openai-endpoint
    config:
      workspaceUrl: https://your-workspace.cloud.databricks.com
      # External model endpoints proxy to providers like OpenAI, Anthropic, etc.

Configuration Options

The Databricks provider extends the OpenAI configuration options with these Databricks-specific features:

Parameter	Description	Default
`workspaceUrl`	Databricks workspace URL. Can also be set via `DATABRICKS_WORKSPACE_URL` environment variable	-
`isPayPerToken`	Whether this is a pay-per-token endpoint (true) or custom deployed endpoint (false)	false
`usageContext`	Optional metadata for usage tracking and cost attribution	-
`aiGatewayConfig`	AI Gateway features configuration (safety filters, PII handling)	-

Advanced Configuration

promptfooconfig.yaml
providers:
  - id: databricks:databricks-claude-3-7-sonnet
    config:
      isPayPerToken: true
      workspaceUrl: https://your-workspace.cloud.databricks.com

      # Standard OpenAI parameters
      temperature: 0.7
      max_tokens: 2000
      top_p: 0.9

      # Usage tracking for cost attribution
      usageContext:
        project: 'customer-support'
        team: 'engineering'
        environment: 'production'

      # AI Gateway features (if enabled on endpoint)
      aiGatewayConfig:
        enableSafety: true
        piiHandling: 'mask' # Options: none, block, mask

Environment Variables

Variable	Description
`DATABRICKS_WORKSPACE_URL`	Your Databricks workspace URL
`DATABRICKS_TOKEN`	Authentication token for Databricks API access

Features

Vision Models

Vision models on Databricks require structured JSON prompts similar to OpenAI's format. Here's how to use them:

promptfooconfig.yaml
prompts:
  - file://vision-prompt.json

providers:
  - id: databricks:databricks-claude-3-7-sonnet
    config:
      isPayPerToken: true

tests:
  - vars:
      question: "What's in this image?"
      image_url: 'https://example.com/image.jpg'

Create a vision-prompt.json file with the proper format:

vision-prompt.json
[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "{{question}}"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "{{image_url}}"
        }
      }
    ]
  }
]

Structured Outputs

Get responses in a specific JSON schema:

providers:
  - id: databricks:databricks-meta-llama-3-3-70b-instruct
    config:
      isPayPerToken: true
      response_format:
        type: 'json_schema'
        json_schema:
          name: 'product_info'
          schema:
            type: 'object'
            properties:
              name:
                type: 'string'
              price:
                type: 'number'
            required: ['name', 'price']

Monitoring and Usage Tracking

Track usage and costs with detailed context:

providers:
  - id: databricks:databricks-meta-llama-3-3-70b-instruct
    config:
      isPayPerToken: true
      usageContext:
        application: 'chatbot'
        customer_id: '12345'
        request_type: 'support_query'
        priority: 'high'

Usage data is available through Databricks system tables:

system.serving.endpoint_usage - Token usage and request metrics
system.serving.served_entities - Endpoint metadata

Best Practices

Choose the right deployment mode:
- Use pay-per-token for experimentation and low-volume use cases
- Use provisioned throughput for production workloads requiring SLAs
- Use external models when you need specific providers' capabilities
Enable AI Gateway features for production endpoints:
- Safety guardrails prevent harmful content
- PII detection protects sensitive data
- Rate limiting controls costs and prevents abuse
Implement proper error handling:
- Pay-per-token endpoints may have rate limits
- Provisioned endpoints may have token-per-second limits
- External model endpoints inherit provider-specific limitations

Example: Multi-Model Comparison

promptfooconfig.yaml
prompts:
  - 'Explain quantum computing to a 10-year-old'

providers:
  # Databricks native model
  - id: databricks:databricks-meta-llama-3-3-70b-instruct
    config:
      isPayPerToken: true
      temperature: 0.7

  # External model via Databricks
  - id: databricks:my-gpt4-endpoint
    config:
      temperature: 0.7

  # Custom deployed model
  - id: databricks:my-finetuned-llama
    config:
      temperature: 0.7

tests:
  - assert:
      - type: llm-rubric
        value: 'Response should be simple, clear, and use age-appropriate analogies'

Troubleshooting

Common issues and solutions:

Authentication errors: Verify your DATABRICKS_TOKEN has the necessary permissions
Endpoint not found:
- For pay-per-token: Ensure you're using the exact endpoint name (e.g., databricks-meta-llama-3-3-70b-instruct)
- For custom endpoints: Verify the endpoint exists and is running
Rate limiting: Pay-per-token endpoints have usage limits; consider provisioned throughput for high-volume use
Token count errors: Some models have specific token limits; adjust max_tokens accordingly

Overview​

Prerequisites​

Basic Usage​

Pay-per-token Endpoints​

Provisioned Throughput Endpoints​

External Models​

Configuration Options​

Advanced Configuration​

Environment Variables​

Features​

Vision Models​

Structured Outputs​

Monitoring and Usage Tracking​

Best Practices​

Example: Multi-Model Comparison​

Troubleshooting​

Additional Resources​