ElevenLabs
The ElevenLabs provider integrates multiple AI audio capabilities for comprehensive voice AI testing and evaluation.
For a comprehensive step-by-step tutorial, see the Evaluating ElevenLabs voice AI guide.
Quick Start
Get started with ElevenLabs in 3 steps:
-
Install and authenticate:
npm install -g promptfoo
export ELEVENLABS_API_KEY=your_api_key_here -
Create a config file (
promptfooconfig.yaml):prompts:
- 'Welcome to our customer service. How can I help you today?'
providers:
- id: elevenlabs:tts:rachel
tests:
- description: Generate welcome message
assert:
- type: cost
threshold: 0.01
- type: latency
threshold: 2000 -
Run your first eval:
promptfoo evalView results with
promptfoo viewor in the web UI.
Setup
Set your ElevenLabs API key as an environment variable:
export ELEVENLABS_API_KEY=your_api_key_here
Alternatively, specify the API key directly in your configuration:
providers:
- id: elevenlabs:tts
config:
apiKey: your_api_key_here
Get your API key from ElevenLabs Settings. Free tier includes 10,000 characters/month.
Capabilities
The ElevenLabs provider supports multiple capabilities:
Text-to-Speech (TTS)
Generate high-quality voice synthesis with multiple models and voices:
elevenlabs:tts:<voice_name>- TTS with specified voice (e.g.,elevenlabs:tts:rachel)elevenlabs:tts- TTS with default voice
Models available:
eleven_flash_v2_5- Fastest, lowest latency (~200ms)eleven_turbo_v2_5- High quality, fasteleven_multilingual_v2- Best for non-English languageseleven_monolingual_v1- English only, high quality
Example:
providers:
- id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
voiceSettings:
stability: 0.5
similarity_boost: 0.75
speed: 1.0
Speech-to-Text (STT)
Transcribe audio with speaker diarization and accuracy metrics:
elevenlabs:stt- Speech-to-text transcription
Features:
- Speaker diarization (identify multiple speakers)
- Word Error Rate (WER) calculation
- Multiple language support
Example:
providers:
- id: elevenlabs:stt
config:
modelId: scribe_v1
diarization: true
maxSpeakers: 3
Conversational Agents
Test voice AI agents with LLM backends and evaluation criteria:
elevenlabs:agents- Voice AI agent testing
Features:
- Multi-turn conversation simulation
- Automated evaluation criteria
- Tool calling and mocking
- LLM cascading for cost optimization
- Custom LLM endpoints
- Multi-voice conversations
- Phone integration (Twilio, SIP)
Example:
providers:
- id: elevenlabs:agents
config:
agentConfig:
name: Customer Support Agent
prompt: You are a helpful support agent
voiceId: 21m00Tcm4TlvDq8ikWAM
llmModel: gpt-4o
evaluationCriteria:
- name: helpfulness
description: Agent provides helpful responses
weight: 1.0
passingThreshold: 0.8
Supporting APIs
Additional audio processing capabilities:
elevenlabs:history- Retrieve agent conversation historyelevenlabs:isolation- Remove background noise from audioelevenlabs:alignment- Generate time-aligned subtitles
Configuration Parameters
All providers support these common parameters:
| Parameter | Description |
|---|---|
apiKey | Your ElevenLabs API key |
apiKeyEnvar | Environment variable containing the API key |
baseUrl | Custom base URL for API (default: ElevenLabs API) |
timeout | Request timeout in milliseconds |
cache | Enable response caching |
cacheTTL | Cache time-to-live in seconds |
enableLogging | Enable debug logging |
retries | Number of retry attempts for failed requests |
TTS-Specific Parameters
| Parameter | Description |
|---|---|
modelId | TTS model (e.g., eleven_flash_v2_5) |
voiceId | Voice ID or name (e.g., 21m00Tcm4TlvDq8ikWAM or rachel) |
voiceSettings | Voice customization (stability, similarity, style, speed) |
outputFormat | Audio format (e.g., mp3_44100_128, pcm_44100) |
seed | Seed for deterministic output |
streaming | Enable WebSocket streaming for low latency |
pronunciationDictionary | Custom pronunciation rules |
voiceDesign | Generate voice from text description |
voiceRemix | Modify voice characteristics (gender, accent, age) |
STT-Specific Parameters
| Parameter | Description |
|---|---|
modelId | STT model (default: scribe_v1) |
language | ISO 639-1 language code (e.g., en, es) |
diarization | Enable speaker diarization |
maxSpeakers | Expected number of speakers (hint) |
audioFormat | Input audio format |
Agent-Specific Parameters
| Parameter | Description |
|---|---|
agentId | Use existing agent ID |
agentConfig | Ephemeral agent configuration |
simulatedUser | Automated user simulation settings |
evaluationCriteria | Evaluation criteria for agent performance |
toolMockConfig | Mock tool responses for testing |
maxTurns | Maximum conversation turns (default: 10) |
llmCascade | LLM fallback configuration |
customLLM | Custom LLM endpoint configuration |
mcpConfig | Model Context Protocol integration |
multiVoice | Multi-voice conversation configuration |
postCallWebhook | Webhook notification after conversation |
phoneConfig | Twilio or SIP phone integration |
Examples
Text-to-Speech: Voice Comparison
prompts:
- 'Welcome to ElevenLabs. Our AI voice technology delivers natural-sounding speech.'
providers:
- id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
- id: elevenlabs:tts:clyde
config:
modelId: eleven_turbo_v2_5
tests:
- description: Audio generation succeeds
assert:
- type: cost
threshold: 0.01
- type: latency
threshold: 5000
Speech-to-Text: Accuracy Testing
prompts:
- file://audio/test-recording.mp3
providers:
- id: elevenlabs:stt
config:
diarization: true
tests:
- description: WER is acceptable
assert:
- type: javascript
value: |
const result = JSON.parse(output);
return result.wer < 0.05; // Less than 5% error
Conversational Agents: Evaluation
prompts:
- |
User: I need help with my order
Agent: I'd be happy to help! What's your order number?
User: ORDER-12345
providers:
- id: elevenlabs:agents
config:
agentConfig:
prompt: You are a helpful customer support agent
llmModel: gpt-4o
evaluationCriteria:
- name: greeting
weight: 0.8
passingThreshold: 0.8
- name: understanding
weight: 1.0
passingThreshold: 0.9
tests:
- description: Agent meets evaluation criteria
assert:
- type: javascript
value: |
const result = JSON.parse(output);
const passed = result.analysis.evaluation_criteria_results.filter(r => r.passed);
return passed.length >= 2;
Audio Processing: Pipeline
# 1. Remove noise from audio
providers:
- id: elevenlabs:isolation
# 2. Transcribe cleaned audio
providers:
- id: elevenlabs:stt
# 3. Generate subtitles
providers:
- id: elevenlabs:alignment
Advanced Features
Pronunciation Dictionaries
Customize pronunciation for technical terms:
providers:
- id: elevenlabs:tts:rachel
config:
pronunciationDictionary:
- word: 'API'
pronunciation: 'A P I'
- word: 'OAuth'
phoneme: 'əʊɔːθ'
Voice Design
Generate custom voices from descriptions:
providers:
- id: elevenlabs:tts
config:
voiceDesign:
name: Custom Voice
description: A middle-aged American male with a deep, authoritative tone
gender: male
age: middle_aged
accent: american
LLM Cascading
Optimize costs with automatic fallback:
providers:
- id: elevenlabs:agents
config:
llmCascade:
primary: gpt-4o
fallback:
- gpt-4o-mini
- gpt-3.5-turbo
cascadeOnError: true
cascadeOnLatency:
enabled: true
maxLatencyMs: 5000
Multi-voice Conversations
Different voices for different characters:
providers:
- id: elevenlabs:agents
config:
multiVoice:
characters:
- name: Agent
voiceId: 21m00Tcm4TlvDq8ikWAM
role: Customer support representative
- name: Customer
voiceId: 2EiwWnXFnvU5JabPnv8n
role: Customer seeking help
Phone Integration
Test agents with real phone calls:
providers:
- id: elevenlabs:agents
config:
phoneConfig:
provider: twilio
twilioAccountSid: ${TWILIO_ACCOUNT_SID}
twilioAuthToken: ${TWILIO_AUTH_TOKEN}
twilioPhoneNumber: +1234567890
Cost Tracking
ElevenLabs usage is tracked automatically:
TTS Costs:
- Flash v2.5: ~$0.015 per 1,000 characters
- Turbo v2.5: ~$0.02 per 1,000 characters
- Multilingual v2: ~$0.03 per 1,000 characters
STT Costs:
- ~$0.10 per minute of audio
Agent Costs:
- Based on conversation duration (~$0.10-0.50 per minute depending on LLM)
Supporting API Costs:
- Audio Isolation: ~$0.10 per minute
- Forced Alignment: ~$0.05 per minute
View costs in eval results:
tests:
- assert:
- type: cost
threshold: 0.50 # Max $0.50 per test
Popular Voices
Common voice IDs and names:
| Name | ID | Description |
|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Calm, clear female |
| Clyde | 2EiwWnXFnvU5JabPnv8n | Warm male |
| Drew | 29vD33N1CtxCmqQRPOHJ | Well-rounded male |
| Paul | 5Q0t7uMcjvnagumLfvZi | Casual male |
| Domi | AZnzlk1XvdvUeBnXmlld | Energetic female |
| Bella | EXAVITQu4vr4xnSDxMaL | Expressive female |
| Antoni | ErXwobaYiN019PkySvjV | Deep male |
| Elli | MF3mGyEYCl7XYWbV9V6O | Young female |
Common Workflows
Voice Quality Testing
Compare voice quality across models and voices:
prompts:
- 'The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.'
providers:
- id: flash-model
label: Flash Model (Fastest)
config:
modelId: eleven_flash_v2_5
voiceId: rachel
- id: turbo-model
label: Turbo Model (Best Quality)
config:
modelId: eleven_turbo_v2_5
voiceId: rachel
tests:
- description: Flash model completes quickly
provider: flash-model
assert:
- type: latency
threshold: 1000
- description: Turbo model has better quality
provider: turbo-model
assert:
- type: cost
threshold: 0.01
Transcription Accuracy Pipeline
Test end-to-end TTS → STT accuracy:
prompts:
- |
The meeting is scheduled for Thursday at 2 PM in conference room B.
Please bring your laptop and quarterly report.
providers:
- id: tts-generator
label: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
- id: stt-transcriber
label: elevenlabs:stt
config:
calculateWER: true
tests:
- vars:
referenceText: 'The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report.'
assert:
- type: javascript
value: |
const result = JSON.parse(output);
if (result.wer_result) {
return result.wer_result.wer < 0.03; // Less than 3% error
}
return true;
Agent Regression Testing
Ensure agent improvements don't degrade performance:
prompts:
- |
User: I need to cancel my subscription
User: Yes, I'm sure
User: Account email is [email protected]
providers:
- id: elevenlabs:agents
config:
agentConfig:
prompt: You are a customer service agent. Always confirm cancellations.
llmModel: gpt-4o
evaluationCriteria:
- name: confirmation_requested
description: Agent asks for confirmation before canceling
weight: 1.0
passingThreshold: 0.9
- name: professional_tone
description: Agent maintains professional tone
weight: 0.8
passingThreshold: 0.8
tests:
- description: Agent handles cancellation properly
assert:
- type: javascript
value: |
const result = JSON.parse(output);
const criteria = result.analysis.evaluation_criteria_results;
return criteria.every(c => c.passed);
Best Practices
1. Choose the Right Model
- Flash v2.5: Use for real-time applications, live streaming, or when latency is critical (<200ms)
- Turbo v2.5: Use for high-quality pre-recorded content where quality matters more than speed
- Multilingual v2: Use for non-English languages or when switching between languages
- Monolingual v1: Use for English-only content requiring the highest quality
2. Optimize Voice Settings
For natural conversation:
voiceSettings:
stability: 0.5 # More variation
similarity_boost: 0.75
speed: 1.0
For consistent narration:
voiceSettings:
stability: 0.8 # Less variation
similarity_boost: 0.85
speed: 0.95
For expressiveness:
voiceSettings:
stability: 0.3 # High variation
similarity_boost: 0.5
style: 0.8 # Amplify style
speed: 1.1
3. Cost Optimization
Use caching for repeated phrases:
providers:
- id: elevenlabs:tts:rachel
config:
cache: true
cacheTTL: 86400 # 24 hours
Implement LLM cascading for agents:
providers:
- id: elevenlabs:agents
config:
llmCascade:
primary: gpt-4o-mini # Cheaper first
fallback:
- gpt-4o # Better fallback
cascadeOnError: true
Test with shorter prompts during development:
providers:
- id: elevenlabs:tts:rachel
tests:
- vars:
shortPrompt: 'Test' # Use during dev
fullPrompt: 'Full production message'
4. Agent Testing Strategy
Start simple, add complexity incrementally:
# Phase 1: Basic functionality
evaluationCriteria:
- name: responds
description: Agent responds to user
weight: 1.0
# Phase 2: Add quality checks
evaluationCriteria:
- name: responds
weight: 0.8
- name: accurate
description: Response is factually correct
weight: 1.0
# Phase 3: Add conversation flow
evaluationCriteria:
- name: responds
weight: 0.6
- name: accurate
weight: 1.0
- name: natural_flow
description: Conversation feels natural
weight: 0.8
5. Audio Quality Assurance
Always test on target platforms:
providers:
- id: elevenlabs:tts:rachel
config:
outputFormat: mp3_44100_128 # Good for web
# outputFormat: pcm_44100 # Better for phone systems
# outputFormat: mp3_22050_32 # Smaller files for mobile
Test with diverse content:
prompts:
# Numbers and dates
- 'Your appointment is on March 15th at 3:30 PM. Confirmation number: 4829.'
# Technical terms
- 'The API returns a JSON response with OAuth2 authentication tokens.'
# Multi-language
- 'Bonjour! Welcome to our multilingual support.'
# Edge cases
- 'Hello... um... can you hear me? Testing, 1, 2, 3.'
6. Monitoring and Observability
Track key metrics:
tests:
- assert:
# Latency thresholds
- type: latency
threshold: 2000
# Cost budgets
- type: cost
threshold: 0.50
# Quality metrics
- type: javascript
value: |
// Track custom metrics
const result = JSON.parse(output);
if (result.audio) {
console.log('Audio size:', result.audio.sizeBytes);
console.log('Format:', result.audio.format);
}
return true;
Use labels for organized results:
providers:
- label: v1-baseline
id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
- label: v2-improved
id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
voiceSettings:
stability: 0.6 # Tweaked setting
Troubleshooting
API Key Issues
Error: ELEVENLABS_API_KEY environment variable is not set
Solution: Ensure your API key is properly set:
# Check if key is set
echo $ELEVENLABS_API_KEY
# Set it if missing
export ELEVENLABS_API_KEY=your_key_here
# Or add to your shell profile
echo 'export ELEVENLABS_API_KEY=your_key' >> ~/.zshrc
source ~/.zshrc
Authentication Errors
Error: 401 Unauthorized
Solution: Verify your API key is valid:
# Test API key directly
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices
If this fails, regenerate your API key at ElevenLabs Settings.
Rate Limiting
Error: 429 Too Many Requests
Solution: Add retry logic and respect rate limits:
providers:
- id: elevenlabs:tts:rachel
config:
retries: 3 # Retry failed requests
timeout: 30000 # Allow time for retries
For high-volume testing, consider:
- Spreading tests over time
- Upgrading to a paid plan
- Using caching to avoid redundant requests
Audio File Issues
Error: Failed to read audio file or Unsupported audio format
Solution: Ensure audio files are accessible and in supported formats:
providers:
- id: elevenlabs:stt
config:
audioFormat: mp3 # Supported: mp3, wav, flac, ogg, webm, m4a
Verify file exists:
ls -lh /path/to/audio.mp3
file /path/to/audio.mp3
Agent Conversation Timeouts
Error: Conversation timeout after X turns
Solution: Adjust conversation limits:
providers:
- id: elevenlabs:agents
config:
maxTurns: 20 # Increase if needed
timeout: 120000 # 2 minutes
Memory Issues with Large Evals
Error: JavaScript heap out of memory
Solution: Increase Node.js memory:
export NODE_OPTIONS="--max-old-space-size=4096"
promptfoo eval
Or run fewer concurrent tests:
promptfoo eval --max-concurrency 2
Voice Not Found
Error: Voice ID not found
Solution: Use correct voice ID or name:
providers:
# Use official voice ID (preferred)
- id: elevenlabs:tts:21m00Tcm4TlvDq8ikWAM
# Or use voice name (case-sensitive)
- id: elevenlabs:tts:Rachel
List available voices:
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices
Cost Tracking Inaccuracies
Issue: Cost estimates don't match billing
Solution: Cost tracking is estimated based on:
- TTS: Character count × model rate
- STT: Audio duration × per-minute rate
- Agents: Conversation duration × LLM rates
For exact costs, check your ElevenLabs billing dashboard.
Examples
Complete working examples:
- TTS Basic - Simple voice generation
- TTS Advanced - Voice design, streaming, pronunciation
- STT - Transcription with diarization
- Agents Basic - Simple agent testing
- Agents Advanced - Multi-voice, tools, LLM cascading
- Supporting APIs - Audio processing pipeline
Learn More
Promptfoo Resources
- Evaluating ElevenLabs voice AI - Step-by-step tutorial
ElevenLabs Resources
- ElevenLabs API Documentation
- Voice Library - Browse and preview voices
- Conversational AI Docs - Agent setup guide
- Pricing - Plan comparison
- Status Page - API status and incidents