Helicone AI Gateway
Helicone AI Gateway is an open-source, self-hosted AI gateway that provides a unified OpenAI-compatible interface for 100+ LLM providers. The Helicone provider in promptfoo allows you to route requests through a locally running Helicone AI Gateway instance.
Benefits​
- Unified Interface: Use OpenAI SDK syntax to access 100+ different LLM providers
- Load Balancing: Smart provider selection based on latency, cost, or custom strategies
- Caching: Intelligent response caching to reduce costs and improve performance
- Rate Limiting: Built-in rate limiting and usage controls
- Observability: Optional integration with Helicone's observability platform
- Self-Hosted: Run your own gateway instance for full control
Setup​
Start Helicone AI Gateway​
First, start a local Helicone AI Gateway instance:
# Set your provider API keys
export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GROQ_API_KEY=your_groq_key
# Start the gateway
npx @helicone/ai-gateway@latest
The gateway will start on http://localhost:8080
by default.
Installation​
No additional dependencies are required. The Helicone provider is built into promptfoo and works with any running Helicone AI Gateway instance.
Usage​
Basic Usage​
To route requests through your local Helicone AI Gateway:
providers:
- helicone:openai/gpt-4o-mini
- helicone:anthropic/claude-3-5-sonnet
- helicone:groq/llama-3.1-8b-instant
The model format is provider/model
as supported by the Helicone AI Gateway.
Custom Configuration​
For more advanced configuration:
providers:
- id: helicone:openai/gpt-4o
config:
# Gateway configuration
baseUrl: http://localhost:8080 # Custom gateway URL
router: production # Use specific router
# Standard OpenAI options
temperature: 0.7
max_tokens: 1500
headers:
Custom-Header: 'custom-value'
Using Custom Router​
If your Helicone AI Gateway is configured with custom routers:
providers:
- id: helicone:openai/gpt-4o
config:
router: production
- id: helicone:openai/gpt-3.5-turbo
config:
router: development
Configuration Options​
Provider Format​
The Helicone provider uses the format: helicone:provider/model
Examples:
helicone:openai/gpt-4o
helicone:anthropic/claude-3-5-sonnet
helicone:groq/llama-3.1-8b-instant
Supported Models​
The Helicone AI Gateway supports 100+ models from various providers. Some popular examples:
Provider | Example Models |
---|---|
OpenAI | openai/gpt-4o , openai/gpt-4o-mini , openai/o1-preview |
Anthropic | anthropic/claude-3-5-sonnet , anthropic/claude-3-haiku |
Groq | groq/llama-3.1-8b-instant , groq/llama-3.1-70b-versatile |
Meta | meta-llama/Llama-3-8b-chat-hf , meta-llama/Llama-3-70b-chat-hf |
google/gemma-7b-it , google/gemma-2b-it |
For a complete list, see the Helicone AI Gateway documentation.
Configuration Parameters​
Gateway Options​
baseUrl
(string): Helicone AI Gateway URL (defaults tohttp://localhost:8080
)router
(string): Custom router name (optional, uses/ai
endpoint if not specified)model
(string): Override the model name from the provider specificationapiKey
(string): Custom API key (defaults toplaceholder-api-key
)
OpenAI-Compatible Options​
Since the provider extends OpenAI's chat completion provider, all standard OpenAI options are supported:
temperature
: Controls randomness (0.0 to 1.0)max_tokens
: Maximum number of tokens to generatetop_p
: Nucleus sampling parameterfrequency_penalty
: Penalizes frequent tokenspresence_penalty
: Penalizes new tokens based on presencestop
: Stop sequencesheaders
: Additional HTTP headers
Examples​
Basic OpenAI Integration​
providers:
- helicone:openai/gpt-4o-mini
prompts:
- "Translate '{{text}}' to French"
tests:
- vars:
text: 'Hello world'
assert:
- type: contains
value: 'Bonjour'
Multi-Provider Comparison with Observability​
providers:
- id: helicone:openai/gpt-4o
config:
tags: ['openai', 'gpt4']
properties:
model_family: 'gpt-4'
- id: helicone:anthropic/claude-3-5-sonnet-20241022
config:
tags: ['anthropic', 'claude']
properties:
model_family: 'claude-3'
prompts:
- 'Write a creative story about {{topic}}'
tests:
- vars:
topic: 'a robot learning to paint'
Custom Provider with Full Configuration​
providers:
- id: helicone:openai/gpt-4o
config:
baseUrl: https://custom-gateway.example.com:8080
router: production
apiKey: your_custom_api_key
temperature: 0.7
max_tokens: 1000
headers:
Authorization: Bearer your_target_provider_api_key
Custom-Header: custom-value
prompts:
- 'Answer the following question: {{question}}'
tests:
- vars:
question: 'What is artificial intelligence?'
Caching and Performance Optimization​
providers:
- id: helicone:openai/gpt-3.5-turbo
config:
cache: true
properties:
cache_strategy: 'aggressive'
use_case: 'batch_processing'
prompts:
- 'Summarize: {{text}}'
tests:
- vars:
text: 'Large text content to summarize...'
assert:
- type: latency
threshold: 2000 # Should be faster due to caching
Features​
Request Monitoring​
All requests routed through Helicone are automatically logged with:
- Request/response payloads
- Token usage and costs
- Latency metrics
- Custom properties and tags
Cost Analytics​
Track costs across different providers and models:
- Per-request cost breakdown
- Aggregated cost analytics
- Cost optimization recommendations
Caching​
Intelligent response caching:
- Semantic similarity matching
- Configurable cache duration
- Cost reduction through cache hits
Rate Limiting​
Built-in rate limiting:
- Per-user limits
- Per-session limits
- Custom rate limiting rules
Best Practices​
- Use Meaningful Tags: Tag your requests with relevant metadata for better analytics
- Track Sessions: Use session IDs to track conversation flows
- Enable Caching: For repeated or similar requests, enable caching to reduce costs
- Monitor Costs: Regularly review cost analytics in the Helicone dashboard
- Custom Properties: Use custom properties to segment and analyze your usage
Troubleshooting​
Common Issues​
- Authentication Failed: Ensure your
HELICONE_API_KEY
is set correctly - Unknown Provider: Check that the provider is in the supported list or use a custom
targetUrl
- Request Timeout: Check your network connection and target provider availability
Debug Mode​
Enable debug logging to see detailed request/response information:
LOG_LEVEL=debug promptfoo eval