Skip to main content

Helicone AI Gateway

Helicone AI Gateway is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.

Benefits

  • Unified Interface: Use OpenAI SDK syntax to access 100+ different LLM providers
  • Load Balancing: Smart provider selection based on latency, cost, or custom strategies
  • Caching: Intelligent response caching to reduce costs and improve performance
  • Rate Limiting: Built-in rate limiting and usage controls
  • Observability: Optional integration with Helicone's observability platform
  • Self-Hosted: Run your own gateway instance for full control

Setup

Start Helicone AI Gateway

First, start a local Helicone AI Gateway instance:

# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key

# Start the gateway
npx @helicone/ai-gateway@latest

The gateway will start on http://localhost:8080 by default.

Installation

No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.

Usage

Basic Usage

To route requests through your local Helicone AI Gateway:

providers:
- helicone:openai/gpt-4o-mini
- helicone:anthropic/claude-3-5-sonnet
- helicone:groq/llama-3.1-8b-instant

The model format is provider/model as supported by the Helicone AI Gateway.

Custom Configuration

For more advanced configuration:

providers:
- id: helicone:openai/gpt-4o
config:
# Gateway configuration
baseUrl: http://localhost:8080 # Custom gateway URL
router: production # Use specific router
# Standard OpenAI options
temperature: 0.7
max_tokens: 1500
headers:
Custom-Header: 'custom-value'

Using Custom Router

If your Helicone AI Gateway is configured with custom routers:

providers:
- id: helicone:openai/gpt-4o
config:
router: production
- id: helicone:openai/gpt-3.5-turbo
config:
router: development

Configuration Options

Provider Format

The Helicone provider uses the format: helicone:provider/model

Examples:

  • helicone:openai/gpt-4o
  • helicone:anthropic/claude-3-5-sonnet
  • helicone:groq/llama-3.1-8b-instant

Supported Models

The Helicone AI Gateway supports 100+ models from various providers. Some popular examples:

ProviderExample Models
OpenAIopenai/gpt-4o, openai/gpt-4o-mini, openai/o1-preview
Anthropicanthropic/claude-3-5-sonnet, anthropic/claude-3-haiku
Groqgroq/llama-3.1-8b-instant, groq/llama-3.1-70b-versatile
Metameta-llama/Llama-3-8b-chat-hf, meta-llama/Llama-3-70b-chat-hf
Googlegoogle/gemma-7b-it, google/gemma-2b-it

For a complete list, see the Helicone AI Gateway documentation.

Configuration Parameters

Gateway Options

  • baseUrl (string): Helicone AI Gateway URL (defaults to http://localhost:8080)
  • router (string): Custom router name (optional, uses /ai endpoint if not specified)
  • model (string): Override the model name from the provider specification
  • apiKey (string): Custom API key (defaults to placeholder-api-key)

OpenAI-Compatible Options

Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:

  • temperature: Controls randomness (0.0 to 1.0)
  • max_tokens: Maximum number of tokens to generate
  • top_p: Nucleus sampling parameter
  • frequency_penalty: Penalizes frequent tokens
  • presence_penalty: Penalizes new tokens based on presence
  • stop: Stop sequences
  • headers: Additional HTTP headers

Examples

Basic OpenAI Integration

providers:
- helicone:openai/gpt-4o-mini

prompts:
- "Translate '{{text}}' to French"

tests:
- vars:
text: 'Hello world'
assert:
- type: contains
value: 'Bonjour'

Multi-Provider Comparison with Observability

providers:
- id: helicone:openai/gpt-4o
config:
tags: ['openai', 'gpt4']
properties:
model_family: 'gpt-4'

- id: helicone:anthropic/claude-3-5-sonnet-20241022
config:
tags: ['anthropic', 'claude']
properties:
model_family: 'claude-3'

prompts:
- 'Write a creative story about {{topic}}'

tests:
- vars:
topic: 'a robot learning to paint'

Custom Provider with Full Configuration

providers:
- id: helicone:openai/gpt-4o
config:
baseUrl: https://custom-gateway.example.com:8080
router: production
apiKey: your_custom_api_key
temperature: 0.7
max_tokens: 1000
headers:
Authorization: Bearer your_target_provider_api_key
Custom-Header: custom-value

prompts:
- 'Answer the following question: {{question}}'

tests:
- vars:
question: 'What is artificial intelligence?'

Caching and Performance Optimization

providers:
- id: helicone:openai/gpt-3.5-turbo
config:
cache: true
properties:
cache_strategy: 'aggressive'
use_case: 'batch_processing'

prompts:
- 'Summarize: {{text}}'

tests:
- vars:
text: 'Large text content to summarize...'
assert:
- type: latency
threshold: 2000 # Should be faster due to caching

Features

Request Monitoring

All requests routed through Helicone are automatically logged with:

  • Request/response payloads
  • Token usage and costs
  • Latency metrics
  • Custom properties and tags

Cost Analytics

Track costs across different providers and models:

  • Per-request cost breakdown
  • Aggregated cost analytics
  • Cost optimization recommendations

Caching

Intelligent response caching:

  • Semantic similarity matching
  • Configurable cache duration
  • Cost reduction through cache hits

Rate Limiting

Built-in rate limiting:

  • Per-user limits
  • Per-session limits
  • Custom rate limiting rules

Best Practices

  1. Use Meaningful Tags: Tag your requests with relevant metadata for better analytics
  2. Track Sessions: Use session IDs to track conversation flows
  3. Enable Caching: For repeated or similar requests, enable caching to reduce costs
  4. Monitor Costs: Regularly review cost analytics in the Helicone dashboard
  5. Custom Properties: Use custom properties to segment and analyze your usage

Troubleshooting

Common Issues

  1. Authentication Failed: Ensure your HELICONE_API_KEY is set correctly
  2. Unknown Provider: Check that the provider is in the supported list or use a custom targetUrl
  3. Request Timeout: Check your network connection and target provider availability

Debug Mode

Enable debug logging to see detailed request/response information:

LOG_LEVEL=debug promptfoo eval