Helicone AI Gateway

Helicone AI Gateway is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.

Benefits

Unified Interface: Use OpenAI SDK syntax to access 100+ different LLM providers
Load Balancing: Smart provider selection based on latency, cost, or custom strategies
Caching: Intelligent response caching to reduce costs and improve performance
Rate Limiting: Built-in rate limiting and usage controls
Observability: Optional integration with Helicone's observability platform
Self-Hosted: Run your own gateway instance for full control

Setup

Start Helicone AI Gateway

First, start a local Helicone AI Gateway instance:

# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key

# Start the gateway
npx @helicone/ai-gateway@latest

The gateway will start on http://localhost:8080 by default.

Installation

No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.

Usage

Basic Usage

To route requests through your local Helicone AI Gateway:

providers:
  - helicone:openai/gpt-4o-mini
  - helicone:anthropic/claude-3-5-sonnet
  - helicone:groq/llama-3.1-8b-instant

The model format is provider/model as supported by the Helicone AI Gateway.

Custom Configuration

For more advanced configuration:

providers:
  - id: helicone:openai/gpt-4o
    config:
      # Gateway configuration
      baseUrl: http://localhost:8080 # Custom gateway URL
      router: production # Use specific router
      # Standard OpenAI options
      temperature: 0.7
      max_tokens: 1500
      headers:
        Custom-Header: 'custom-value'

Using Custom Router

If your Helicone AI Gateway is configured with custom routers:

providers:
  - id: helicone:openai/gpt-4o
    config:
      router: production
  - id: helicone:openai/gpt-3.5-turbo
    config:
      router: development

Configuration Options

Provider Format

The Helicone provider uses the format: helicone:provider/model

Examples:

helicone:openai/gpt-4o
helicone:anthropic/claude-3-5-sonnet
helicone:groq/llama-3.1-8b-instant

Supported Models

The Helicone AI Gateway supports 100+ models from various providers. Some popular examples:

Provider	Example Models
OpenAI	`openai/gpt-4o`, `openai/gpt-4o-mini`, `openai/o1-preview`
Anthropic	`anthropic/claude-3-5-sonnet`, `anthropic/claude-3-haiku`
Groq	`groq/llama-3.1-8b-instant`, `groq/llama-3.1-70b-versatile`
Meta	`meta-llama/Llama-3-8b-chat-hf`, `meta-llama/Llama-3-70b-chat-hf`
Google	`google/gemma-7b-it`, `google/gemma-2b-it`

For a complete list, see the Helicone AI Gateway documentation.

Configuration Parameters

Gateway Options

baseUrl (string): Helicone AI Gateway URL (defaults to http://localhost:8080)
router (string): Custom router name (optional, uses /ai endpoint if not specified)
model (string): Override the model name from the provider specification
apiKey (string): Custom API key (defaults to placeholder-api-key)

OpenAI-Compatible Options

Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:

temperature: Controls randomness (0.0 to 1.0)
max_tokens: Maximum number of tokens to generate
top_p: Nucleus sampling parameter
frequency_penalty: Penalizes frequent tokens
presence_penalty: Penalizes new tokens based on presence
stop: Stop sequences
headers: Additional HTTP headers

Examples

Basic OpenAI Integration

providers:
  - helicone:openai/gpt-4o-mini

prompts:
  - "Translate '{{text}}' to French"

tests:
  - vars:
      text: 'Hello world'
    assert:
      - type: contains
        value: 'Bonjour'

Multi-Provider Comparison with Observability

providers:
  - id: helicone:openai/gpt-4o
    config:
      tags: ['openai', 'gpt4']
      properties:
        model_family: 'gpt-4'

  - id: helicone:anthropic/claude-3-5-sonnet-20241022
    config:
      tags: ['anthropic', 'claude']
      properties:
        model_family: 'claude-3'

prompts:
  - 'Write a creative story about {{topic}}'

tests:
  - vars:
      topic: 'a robot learning to paint'

Custom Provider with Full Configuration

providers:
  - id: helicone:openai/gpt-4o
    config:
      baseUrl: https://custom-gateway.example.com:8080
      router: production
      apiKey: your_custom_api_key
      temperature: 0.7
      max_tokens: 1000
      headers:
        Authorization: Bearer your_target_provider_api_key
        Custom-Header: custom-value

prompts:
  - 'Answer the following question: {{question}}'

tests:
  - vars:
      question: 'What is artificial intelligence?'

Caching and Performance Optimization

providers:
  - id: helicone:openai/gpt-3.5-turbo
    config:
      cache: true
      properties:
        cache_strategy: 'aggressive'
        use_case: 'batch_processing'

prompts:
  - 'Summarize: {{text}}'

tests:
  - vars:
      text: 'Large text content to summarize...'
    assert:
      - type: latency
        threshold: 2000 # Should be faster due to caching

Features

Request Monitoring

All requests routed through Helicone are automatically logged with:

Request/response payloads
Token usage and costs
Latency metrics
Custom properties and tags

Cost Analytics

Track costs across different providers and models:

Per-request cost breakdown
Aggregated cost analytics
Cost optimization recommendations

Caching

Intelligent response caching:

Semantic similarity matching
Configurable cache duration
Cost reduction through cache hits

Rate Limiting

Built-in rate limiting:

Per-user limits
Per-session limits
Custom rate limiting rules

Best Practices

Use Meaningful Tags: Tag your requests with relevant metadata for better analytics
Track Sessions: Use session IDs to track conversation flows
Enable Caching: For repeated or similar requests, enable caching to reduce costs
Monitor Costs: Regularly review cost analytics in the Helicone dashboard
Custom Properties: Use custom properties to segment and analyze your usage

Troubleshooting

Common Issues

Authentication Failed: Ensure your HELICONE_API_KEY is set correctly
Unknown Provider: Check that the provider is in the supported list or use a custom targetUrl
Request Timeout: Check your network connection and target provider availability

Debug Mode

Enable debug logging to see detailed request/response information:

LOG_LEVEL=debug promptfoo eval

Benefits​

Setup​

Start Helicone AI Gateway​

Installation​

Usage​

Basic Usage​

Custom Configuration​

Using Custom Router​

Configuration Options​

Provider Format​

Supported Models​

Configuration Parameters​

Gateway Options​

OpenAI-Compatible Options​

Examples​

Basic OpenAI Integration​

Multi-Provider Comparison with Observability​

Custom Provider with Full Configuration​

Caching and Performance Optimization​

Features​

Request Monitoring​

Cost Analytics​

Caching​

Rate Limiting​

Best Practices​

Troubleshooting​

Common Issues​

Debug Mode​

Related Links​