Skip to main content

Databricks Foundation Model APIs

The Databricks provider integrates with Databricks' Foundation Model APIs, offering access to state-of-the-art models through a unified OpenAI-compatible interface. It supports multiple deployment modes to match your specific use case and performance requirements.

Overview​

Databricks Foundation Model APIs provide three main deployment options:

  1. Pay-per-token endpoints: Pre-configured endpoints for popular models with usage-based pricing
  2. Provisioned throughput: Dedicated endpoints with guaranteed performance for production workloads
  3. External models: Unified access to models from providers like OpenAI, Anthropic, and Google through Databricks

Prerequisites​

  1. A Databricks workspace with Foundation Model APIs enabled
  2. A Databricks access token for authentication
  3. Your workspace URL (e.g., https://your-workspace.cloud.databricks.com)

Set up your environment:

export DATABRICKS_WORKSPACE_URL=https://your-workspace.cloud.databricks.com
export DATABRICKS_TOKEN=your-token-here

Basic Usage​

Pay-per-token Endpoints​

Access pre-configured Foundation Model endpoints with simple configuration:

promptfooconfig.yaml
providers:
- id: databricks:databricks-meta-llama-3-3-70b-instruct
config:
isPayPerToken: true
workspaceUrl: https://your-workspace.cloud.databricks.com

Available pay-per-token models include:

  • databricks-meta-llama-3-3-70b-instruct - Meta's latest Llama model
  • databricks-claude-3-7-sonnet - Anthropic Claude with reasoning capabilities
  • databricks-gte-large-en - Text embeddings model
  • databricks-dbrx-instruct - Databricks' own foundation model

Provisioned Throughput Endpoints​

For production workloads requiring guaranteed performance:

providers:
- id: databricks:my-custom-endpoint
config:
workspaceUrl: https://your-workspace.cloud.databricks.com
temperature: 0.7
max_tokens: 500

External Models​

Access external models through Databricks' unified API:

providers:
- id: databricks:my-openai-endpoint
config:
workspaceUrl: https://your-workspace.cloud.databricks.com
# External model endpoints proxy to providers like OpenAI, Anthropic, etc.

Configuration Options​

The Databricks provider extends the OpenAI configuration options with these Databricks-specific features:

ParameterDescriptionDefault
workspaceUrlDatabricks workspace URL. Can also be set via DATABRICKS_WORKSPACE_URL environment variable-
isPayPerTokenWhether this is a pay-per-token endpoint (true) or custom deployed endpoint (false)false
usageContextOptional metadata for usage tracking and cost attribution-
aiGatewayConfigAI Gateway features configuration (safety filters, PII handling)-

Advanced Configuration​

promptfooconfig.yaml
providers:
- id: databricks:databricks-claude-3-7-sonnet
config:
isPayPerToken: true
workspaceUrl: https://your-workspace.cloud.databricks.com

# Standard OpenAI parameters
temperature: 0.7
max_tokens: 2000
top_p: 0.9

# Usage tracking for cost attribution
usageContext:
project: 'customer-support'
team: 'engineering'
environment: 'production'

# AI Gateway features (if enabled on endpoint)
aiGatewayConfig:
enableSafety: true
piiHandling: 'mask' # Options: none, block, mask

Environment Variables​

VariableDescription
DATABRICKS_WORKSPACE_URLYour Databricks workspace URL
DATABRICKS_TOKENAuthentication token for Databricks API access

Features​

Vision Models​

Vision models on Databricks require structured JSON prompts similar to OpenAI's format. Here's how to use them:

promptfooconfig.yaml
prompts:
- file://vision-prompt.json

providers:
- id: databricks:databricks-claude-3-7-sonnet
config:
isPayPerToken: true

tests:
- vars:
question: "What's in this image?"
image_url: 'https://example.com/image.jpg'

Create a vision-prompt.json file with the proper format:

vision-prompt.json
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "{{question}}"
},
{
"type": "image_url",
"image_url": {
"url": "{{image_url}}"
}
}
]
}
]

Structured Outputs​

Get responses in a specific JSON schema:

providers:
- id: databricks:databricks-meta-llama-3-3-70b-instruct
config:
isPayPerToken: true
response_format:
type: 'json_schema'
json_schema:
name: 'product_info'
schema:
type: 'object'
properties:
name:
type: 'string'
price:
type: 'number'
required: ['name', 'price']

Monitoring and Usage Tracking​

Track usage and costs with detailed context:

providers:
- id: databricks:databricks-meta-llama-3-3-70b-instruct
config:
isPayPerToken: true
usageContext:
application: 'chatbot'
customer_id: '12345'
request_type: 'support_query'
priority: 'high'

Usage data is available through Databricks system tables:

  • system.serving.endpoint_usage - Token usage and request metrics
  • system.serving.served_entities - Endpoint metadata

Best Practices​

  1. Choose the right deployment mode:

    • Use pay-per-token for experimentation and low-volume use cases
    • Use provisioned throughput for production workloads requiring SLAs
    • Use external models when you need specific providers' capabilities
  2. Enable AI Gateway features for production endpoints:

    • Safety guardrails prevent harmful content
    • PII detection protects sensitive data
    • Rate limiting controls costs and prevents abuse
  3. Implement proper error handling:

    • Pay-per-token endpoints may have rate limits
    • Provisioned endpoints may have token-per-second limits
    • External model endpoints inherit provider-specific limitations

Example: Multi-Model Comparison​

promptfooconfig.yaml
prompts:
- 'Explain quantum computing to a 10-year-old'

providers:
# Databricks native model
- id: databricks:databricks-meta-llama-3-3-70b-instruct
config:
isPayPerToken: true
temperature: 0.7

# External model via Databricks
- id: databricks:my-gpt4-endpoint
config:
temperature: 0.7

# Custom deployed model
- id: databricks:my-finetuned-llama
config:
temperature: 0.7

tests:
- assert:
- type: llm-rubric
value: 'Response should be simple, clear, and use age-appropriate analogies'

Troubleshooting​

Common issues and solutions:

  1. Authentication errors: Verify your DATABRICKS_TOKEN has the necessary permissions
  2. Endpoint not found:
    • For pay-per-token: Ensure you're using the exact endpoint name (e.g., databricks-meta-llama-3-3-70b-instruct)
    • For custom endpoints: Verify the endpoint exists and is running
  3. Rate limiting: Pay-per-token endpoints have usage limits; consider provisioned throughput for high-volume use
  4. Token count errors: Some models have specific token limits; adjust max_tokens accordingly

Additional Resources​