Helicone AI Gateway
Helicone AI Gateway is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.
Benefits
- Unified Interface: Use OpenAI SDK syntax to access 100+ different LLM providers
- Load Balancing: Smart provider selection based on latency, cost, or custom strategies
- Caching: Intelligent response caching to reduce costs and improve performance
- Rate Limiting: Built-in rate limiting and usage controls
- Observability: Optional integration with Helicone's observability platform
- Self-Hosted: Run your own gateway instance for full control
Setup
Start Helicone AI Gateway
First, start a local Helicone AI Gateway instance:
# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key
# Start the gateway
npx @helicone/ai-gateway@latest
The gateway will start on http://localhost:8080
by default.
Installation
No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.
Usage
Basic Usage
To route requests through your local Helicone AI Gateway:
providers:
- helicone:openai/gpt-4o-mini
- helicone:anthropic/claude-3-5-sonnet
- helicone:groq/llama-3.1-8b-instant
The model format is provider/model
as supported by the Helicone AI Gateway.
Custom Configuration
For more advanced configuration:
providers:
- id: helicone:openai/gpt-4o
config:
# Gateway configuration
baseUrl: http://localhost:8080 # Custom gateway URL
router: production # Use specific router
# Standard OpenAI options
temperature: 0.7
max_tokens: 1500
headers:
Custom-Header: 'custom-value'
Using Custom Router
If your Helicone AI Gateway is configured with custom routers:
providers:
- id: helicone:openai/gpt-4o
config:
router: production
- id: helicone:openai/gpt-3.5-turbo
config:
router: development
Configuration Options
Provider Format
The Helicone provider uses the format: helicone:provider/model
Examples:
helicone:openai/gpt-4o
helicone:anthropic/claude-3-5-sonnet
helicone:groq/llama-3.1-8b-instant
Supported Models
The Helicone AI Gateway supports 100+ models from various providers. Some popular examples:
Provider | Example Models |
---|---|
OpenAI | openai/gpt-4o , openai/gpt-4o-mini , openai/o1-preview |
Anthropic | anthropic/claude-3-5-sonnet , anthropic/claude-3-haiku |
Groq | groq/llama-3.1-8b-instant , groq/llama-3.1-70b-versatile |
Meta | meta-llama/Llama-3-8b-chat-hf , meta-llama/Llama-3-70b-chat-hf |
google/gemma-7b-it , google/gemma-2b-it |
For a complete list, see the Helicone AI Gateway documentation.
Configuration Parameters
Gateway Options
baseUrl
(string): Helicone AI Gateway URL (defaults tohttp://localhost:8080
)router
(string): Custom router name (optional, uses/ai
endpoint if not specified)model
(string): Override the model name from the provider specificationapiKey
(string): Custom API key (defaults toplaceholder-api-key
)
OpenAI-Compatible Options
Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:
temperature
: Controls randomness (0.0 to 1.0)max_tokens
: Maximum number of tokens to generatetop_p
: Nucleus sampling parameterfrequency_penalty
: Penalizes frequent tokenspresence_penalty
: Penalizes new tokens based on presencestop
: Stop sequencesheaders
: Additional HTTP headers
Examples
Basic OpenAI Integration
providers:
- helicone:openai/gpt-4o-mini
prompts:
- "Translate '{{text}}' to French"
tests:
- vars:
text: 'Hello world'
assert:
- type: contains
value: 'Bonjour'
Multi-Provider Comparison with Observability
providers:
- id: helicone:openai/gpt-4o
config:
tags: ['openai', 'gpt4']
properties:
model_family: 'gpt-4'
- id: helicone:anthropic/claude-3-5-sonnet-20241022
config:
tags: ['anthropic', 'claude']
properties:
model_family: 'claude-3'
prompts:
- 'Write a creative story about {{topic}}'
tests:
- vars:
topic: 'a robot learning to paint'
Custom Provider with Full Configuration
providers:
- id: helicone:openai/gpt-4o
config:
baseUrl: https://custom-gateway.example.com:8080
router: production
apiKey: your_custom_api_key
temperature: 0.7
max_tokens: 1000
headers:
Authorization: Bearer your_target_provider_api_key
Custom-Header: custom-value
prompts:
- 'Answer the following question: {{question}}'
tests:
- vars:
question: 'What is artificial intelligence?'
Caching and Performance Optimization
providers:
- id: helicone:openai/gpt-3.5-turbo
config:
cache: true
properties:
cache_strategy: 'aggressive'
use_case: 'batch_processing'
prompts:
- 'Summarize: {{text}}'
tests:
- vars:
text: 'Large text content to summarize...'
assert:
- type: latency
threshold: 2000 # Should be faster due to caching
Features
Request Monitoring
All requests routed through Helicone are automatically logged with:
- Request/response payloads
- Token usage and costs
- Latency metrics
- Custom properties and tags
Cost Analytics
Track costs across different providers and models:
- Per-request cost breakdown
- Aggregated cost analytics
- Cost optimization recommendations
Caching
Intelligent response caching:
- Semantic similarity matching
- Configurable cache duration
- Cost reduction through cache hits
Rate Limiting
Built-in rate limiting:
- Per-user limits
- Per-session limits
- Custom rate limiting rules
Best Practices
- Use Meaningful Tags: Tag your requests with relevant metadata for better analytics
- Track Sessions: Use session IDs to track conversation flows
- Enable Caching: For repeated or similar requests, enable caching to reduce costs
- Monitor Costs: Regularly review cost analytics in the Helicone dashboard
- Custom Properties: Use custom properties to segment and analyze your usage
Troubleshooting
Common Issues
- Authentication Failed: Ensure your
HELICONE_API_KEY
is set correctly - Unknown Provider: Check that the provider is in the supported list or use a custom
targetUrl
- Request Timeout: Check your network connection and target provider availability
Debug Mode
Enable debug logging to see detailed request/response information:
LOG_LEVEL=debug promptfoo eval