Skip to main content

ElevenLabs

The ElevenLabs provider integrates multiple AI audio capabilities for comprehensive voice AI testing and evaluation.

tip

For a comprehensive step-by-step tutorial, see the Evaluating ElevenLabs voice AI guide.

Quick Start

Get started with ElevenLabs in 3 steps:

  1. Install and authenticate:

    npm install -g promptfoo
    export ELEVENLABS_API_KEY=your_api_key_here
  2. Create a config file (promptfooconfig.yaml):

    prompts:
    - 'Welcome to our customer service. How can I help you today?'

    providers:
    - id: elevenlabs:tts:rachel

    tests:
    - description: Generate welcome message
    assert:
    - type: cost
    threshold: 0.01
    - type: latency
    threshold: 2000
  3. Run your first eval:

    promptfoo eval

    View results with promptfoo view or in the web UI.

Setup

Set your ElevenLabs API key as an environment variable:

export ELEVENLABS_API_KEY=your_api_key_here

Alternatively, specify the API key directly in your configuration:

providers:
- id: elevenlabs:tts
config:
apiKey: your_api_key_here
tip

Get your API key from ElevenLabs Settings. Free tier includes 10,000 characters/month.

Capabilities

The ElevenLabs provider supports multiple capabilities:

Text-to-Speech (TTS)

Generate high-quality voice synthesis with multiple models and voices:

  • elevenlabs:tts:<voice_name> - TTS with specified voice (e.g., elevenlabs:tts:rachel)
  • elevenlabs:tts - TTS with default voice

Models available:

  • eleven_flash_v2_5 - Fastest, lowest latency (~200ms)
  • eleven_turbo_v2_5 - High quality, fast
  • eleven_multilingual_v2 - Best for non-English languages
  • eleven_monolingual_v1 - English only, high quality

Example:

providers:
- id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
voiceSettings:
stability: 0.5
similarity_boost: 0.75
speed: 1.0

Speech-to-Text (STT)

Transcribe audio with speaker diarization and accuracy metrics:

  • elevenlabs:stt - Speech-to-text transcription

Features:

  • Speaker diarization (identify multiple speakers)
  • Word Error Rate (WER) calculation
  • Multiple language support

Example:

providers:
- id: elevenlabs:stt
config:
modelId: scribe_v1
diarization: true
maxSpeakers: 3

Conversational Agents

Test voice AI agents with LLM backends and evaluation criteria:

  • elevenlabs:agents - Voice AI agent testing

Features:

  • Multi-turn conversation simulation
  • Automated evaluation criteria
  • Tool calling and mocking
  • LLM cascading for cost optimization
  • Custom LLM endpoints
  • Multi-voice conversations
  • Phone integration (Twilio, SIP)

Example:

providers:
- id: elevenlabs:agents
config:
agentConfig:
name: Customer Support Agent
prompt: You are a helpful support agent
voiceId: 21m00Tcm4TlvDq8ikWAM
llmModel: gpt-4o
evaluationCriteria:
- name: helpfulness
description: Agent provides helpful responses
weight: 1.0
passingThreshold: 0.8

Supporting APIs

Additional audio processing capabilities:

  • elevenlabs:history - Retrieve agent conversation history
  • elevenlabs:isolation - Remove background noise from audio
  • elevenlabs:alignment - Generate time-aligned subtitles

Configuration Parameters

All providers support these common parameters:

ParameterDescription
apiKeyYour ElevenLabs API key
apiKeyEnvarEnvironment variable containing the API key
baseUrlCustom base URL for API (default: ElevenLabs API)
timeoutRequest timeout in milliseconds
cacheEnable response caching
cacheTTLCache time-to-live in seconds
enableLoggingEnable debug logging
retriesNumber of retry attempts for failed requests

TTS-Specific Parameters

ParameterDescription
modelIdTTS model (e.g., eleven_flash_v2_5)
voiceIdVoice ID or name (e.g., 21m00Tcm4TlvDq8ikWAM or rachel)
voiceSettingsVoice customization (stability, similarity, style, speed)
outputFormatAudio format (e.g., mp3_44100_128, pcm_44100)
seedSeed for deterministic output
streamingEnable WebSocket streaming for low latency
pronunciationDictionaryCustom pronunciation rules
voiceDesignGenerate voice from text description
voiceRemixModify voice characteristics (gender, accent, age)

STT-Specific Parameters

ParameterDescription
modelIdSTT model (default: scribe_v1)
languageISO 639-1 language code (e.g., en, es)
diarizationEnable speaker diarization
maxSpeakersExpected number of speakers (hint)
audioFormatInput audio format

Agent-Specific Parameters

ParameterDescription
agentIdUse existing agent ID
agentConfigEphemeral agent configuration
simulatedUserAutomated user simulation settings
evaluationCriteriaEvaluation criteria for agent performance
toolMockConfigMock tool responses for testing
maxTurnsMaximum conversation turns (default: 10)
llmCascadeLLM fallback configuration
customLLMCustom LLM endpoint configuration
mcpConfigModel Context Protocol integration
multiVoiceMulti-voice conversation configuration
postCallWebhookWebhook notification after conversation
phoneConfigTwilio or SIP phone integration

Examples

Text-to-Speech: Voice Comparison

prompts:
- 'Welcome to ElevenLabs. Our AI voice technology delivers natural-sounding speech.'

providers:
- id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5

- id: elevenlabs:tts:clyde
config:
modelId: eleven_turbo_v2_5

tests:
- description: Audio generation succeeds
assert:
- type: cost
threshold: 0.01
- type: latency
threshold: 5000

Speech-to-Text: Accuracy Testing

prompts:
- file://audio/test-recording.mp3

providers:
- id: elevenlabs:stt
config:
diarization: true

tests:
- description: WER is acceptable
assert:
- type: javascript
value: |
const result = JSON.parse(output);
return result.wer < 0.05; // Less than 5% error

Conversational Agents: Evaluation

prompts:
- |
User: I need help with my order
Agent: I'd be happy to help! What's your order number?
User: ORDER-12345

providers:
- id: elevenlabs:agents
config:
agentConfig:
prompt: You are a helpful customer support agent
llmModel: gpt-4o
evaluationCriteria:
- name: greeting
weight: 0.8
passingThreshold: 0.8
- name: understanding
weight: 1.0
passingThreshold: 0.9

tests:
- description: Agent meets evaluation criteria
assert:
- type: javascript
value: |
const result = JSON.parse(output);
const passed = result.analysis.evaluation_criteria_results.filter(r => r.passed);
return passed.length >= 2;

Audio Processing: Pipeline

# 1. Remove noise from audio
providers:
- id: elevenlabs:isolation

# 2. Transcribe cleaned audio
providers:
- id: elevenlabs:stt

# 3. Generate subtitles
providers:
- id: elevenlabs:alignment

Advanced Features

Pronunciation Dictionaries

Customize pronunciation for technical terms:

providers:
- id: elevenlabs:tts:rachel
config:
pronunciationDictionary:
- word: 'API'
pronunciation: 'A P I'
- word: 'OAuth'
phoneme: 'əʊɔːθ'

Voice Design

Generate custom voices from descriptions:

providers:
- id: elevenlabs:tts
config:
voiceDesign:
name: Custom Voice
description: A middle-aged American male with a deep, authoritative tone
gender: male
age: middle_aged
accent: american

LLM Cascading

Optimize costs with automatic fallback:

providers:
- id: elevenlabs:agents
config:
llmCascade:
primary: gpt-4o
fallback:
- gpt-4o-mini
- gpt-3.5-turbo
cascadeOnError: true
cascadeOnLatency:
enabled: true
maxLatencyMs: 5000

Multi-voice Conversations

Different voices for different characters:

providers:
- id: elevenlabs:agents
config:
multiVoice:
characters:
- name: Agent
voiceId: 21m00Tcm4TlvDq8ikWAM
role: Customer support representative
- name: Customer
voiceId: 2EiwWnXFnvU5JabPnv8n
role: Customer seeking help

Phone Integration

Test agents with real phone calls:

providers:
- id: elevenlabs:agents
config:
phoneConfig:
provider: twilio
twilioAccountSid: ${TWILIO_ACCOUNT_SID}
twilioAuthToken: ${TWILIO_AUTH_TOKEN}
twilioPhoneNumber: +1234567890

Cost Tracking

ElevenLabs usage is tracked automatically:

TTS Costs:

  • Flash v2.5: ~$0.015 per 1,000 characters
  • Turbo v2.5: ~$0.02 per 1,000 characters
  • Multilingual v2: ~$0.03 per 1,000 characters

STT Costs:

  • ~$0.10 per minute of audio

Agent Costs:

  • Based on conversation duration (~$0.10-0.50 per minute depending on LLM)

Supporting API Costs:

  • Audio Isolation: ~$0.10 per minute
  • Forced Alignment: ~$0.05 per minute

View costs in eval results:

tests:
- assert:
- type: cost
threshold: 0.50 # Max $0.50 per test

Common voice IDs and names:

NameIDDescription
Rachel21m00Tcm4TlvDq8ikWAMCalm, clear female
Clyde2EiwWnXFnvU5JabPnv8nWarm male
Drew29vD33N1CtxCmqQRPOHJWell-rounded male
Paul5Q0t7uMcjvnagumLfvZiCasual male
DomiAZnzlk1XvdvUeBnXmlldEnergetic female
BellaEXAVITQu4vr4xnSDxMaLExpressive female
AntoniErXwobaYiN019PkySvjVDeep male
ElliMF3mGyEYCl7XYWbV9V6OYoung female

Common Workflows

Voice Quality Testing

Compare voice quality across models and voices:

prompts:
- 'The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.'

providers:
- id: flash-model
label: Flash Model (Fastest)
config:
modelId: eleven_flash_v2_5
voiceId: rachel

- id: turbo-model
label: Turbo Model (Best Quality)
config:
modelId: eleven_turbo_v2_5
voiceId: rachel

tests:
- description: Flash model completes quickly
provider: flash-model
assert:
- type: latency
threshold: 1000

- description: Turbo model has better quality
provider: turbo-model
assert:
- type: cost
threshold: 0.01

Transcription Accuracy Pipeline

Test end-to-end TTS → STT accuracy:

prompts:
- |
The meeting is scheduled for Thursday at 2 PM in conference room B.
Please bring your laptop and quarterly report.

providers:
- id: tts-generator
label: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5

- id: stt-transcriber
label: elevenlabs:stt
config:
calculateWER: true

tests:
- vars:
referenceText: 'The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report.'
assert:
- type: javascript
value: |
const result = JSON.parse(output);
if (result.wer_result) {
return result.wer_result.wer < 0.03; // Less than 3% error
}
return true;

Agent Regression Testing

Ensure agent improvements don't degrade performance:

prompts:
- |
User: I need to cancel my subscription
User: Yes, I'm sure
User: Account email is [email protected]

providers:
- id: elevenlabs:agents
config:
agentConfig:
prompt: You are a customer service agent. Always confirm cancellations.
llmModel: gpt-4o
evaluationCriteria:
- name: confirmation_requested
description: Agent asks for confirmation before canceling
weight: 1.0
passingThreshold: 0.9
- name: professional_tone
description: Agent maintains professional tone
weight: 0.8
passingThreshold: 0.8

tests:
- description: Agent handles cancellation properly
assert:
- type: javascript
value: |
const result = JSON.parse(output);
const criteria = result.analysis.evaluation_criteria_results;
return criteria.every(c => c.passed);

Best Practices

1. Choose the Right Model

  • Flash v2.5: Use for real-time applications, live streaming, or when latency is critical (<200ms)
  • Turbo v2.5: Use for high-quality pre-recorded content where quality matters more than speed
  • Multilingual v2: Use for non-English languages or when switching between languages
  • Monolingual v1: Use for English-only content requiring the highest quality

2. Optimize Voice Settings

For natural conversation:

voiceSettings:
stability: 0.5 # More variation
similarity_boost: 0.75
speed: 1.0

For consistent narration:

voiceSettings:
stability: 0.8 # Less variation
similarity_boost: 0.85
speed: 0.95

For expressiveness:

voiceSettings:
stability: 0.3 # High variation
similarity_boost: 0.5
style: 0.8 # Amplify style
speed: 1.1

3. Cost Optimization

Use caching for repeated phrases:

providers:
- id: elevenlabs:tts:rachel
config:
cache: true
cacheTTL: 86400 # 24 hours

Implement LLM cascading for agents:

providers:
- id: elevenlabs:agents
config:
llmCascade:
primary: gpt-4o-mini # Cheaper first
fallback:
- gpt-4o # Better fallback
cascadeOnError: true

Test with shorter prompts during development:

providers:
- id: elevenlabs:tts:rachel
tests:
- vars:
shortPrompt: 'Test' # Use during dev
fullPrompt: 'Full production message'

4. Agent Testing Strategy

Start simple, add complexity incrementally:

# Phase 1: Basic functionality
evaluationCriteria:
- name: responds
description: Agent responds to user
weight: 1.0

# Phase 2: Add quality checks
evaluationCriteria:
- name: responds
weight: 0.8
- name: accurate
description: Response is factually correct
weight: 1.0

# Phase 3: Add conversation flow
evaluationCriteria:
- name: responds
weight: 0.6
- name: accurate
weight: 1.0
- name: natural_flow
description: Conversation feels natural
weight: 0.8

5. Audio Quality Assurance

Always test on target platforms:

providers:
- id: elevenlabs:tts:rachel
config:
outputFormat: mp3_44100_128 # Good for web
# outputFormat: pcm_44100 # Better for phone systems
# outputFormat: mp3_22050_32 # Smaller files for mobile

Test with diverse content:

prompts:
# Numbers and dates
- 'Your appointment is on March 15th at 3:30 PM. Confirmation number: 4829.'

# Technical terms
- 'The API returns a JSON response with OAuth2 authentication tokens.'

# Multi-language
- 'Bonjour! Welcome to our multilingual support.'

# Edge cases
- 'Hello... um... can you hear me? Testing, 1, 2, 3.'

6. Monitoring and Observability

Track key metrics:

tests:
- assert:
# Latency thresholds
- type: latency
threshold: 2000

# Cost budgets
- type: cost
threshold: 0.50

# Quality metrics
- type: javascript
value: |
// Track custom metrics
const result = JSON.parse(output);
if (result.audio) {
console.log('Audio size:', result.audio.sizeBytes);
console.log('Format:', result.audio.format);
}
return true;

Use labels for organized results:

providers:
- label: v1-baseline
id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5

- label: v2-improved
id: elevenlabs:tts:rachel
config:
modelId: eleven_flash_v2_5
voiceSettings:
stability: 0.6 # Tweaked setting

Troubleshooting

API Key Issues

Error: ELEVENLABS_API_KEY environment variable is not set

Solution: Ensure your API key is properly set:

# Check if key is set
echo $ELEVENLABS_API_KEY

# Set it if missing
export ELEVENLABS_API_KEY=your_key_here

# Or add to your shell profile
echo 'export ELEVENLABS_API_KEY=your_key' >> ~/.zshrc
source ~/.zshrc

Authentication Errors

Error: 401 Unauthorized

Solution: Verify your API key is valid:

# Test API key directly
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

If this fails, regenerate your API key at ElevenLabs Settings.

Rate Limiting

Error: 429 Too Many Requests

Solution: Add retry logic and respect rate limits:

providers:
- id: elevenlabs:tts:rachel
config:
retries: 3 # Retry failed requests
timeout: 30000 # Allow time for retries

For high-volume testing, consider:

  • Spreading tests over time
  • Upgrading to a paid plan
  • Using caching to avoid redundant requests

Audio File Issues

Error: Failed to read audio file or Unsupported audio format

Solution: Ensure audio files are accessible and in supported formats:

providers:
- id: elevenlabs:stt
config:
audioFormat: mp3 # Supported: mp3, wav, flac, ogg, webm, m4a

Verify file exists:

ls -lh /path/to/audio.mp3
file /path/to/audio.mp3

Agent Conversation Timeouts

Error: Conversation timeout after X turns

Solution: Adjust conversation limits:

providers:
- id: elevenlabs:agents
config:
maxTurns: 20 # Increase if needed
timeout: 120000 # 2 minutes

Memory Issues with Large Evals

Error: JavaScript heap out of memory

Solution: Increase Node.js memory:

export NODE_OPTIONS="--max-old-space-size=4096"
promptfoo eval

Or run fewer concurrent tests:

promptfoo eval --max-concurrency 2

Voice Not Found

Error: Voice ID not found

Solution: Use correct voice ID or name:

providers:
# Use official voice ID (preferred)
- id: elevenlabs:tts:21m00Tcm4TlvDq8ikWAM

# Or use voice name (case-sensitive)
- id: elevenlabs:tts:Rachel

List available voices:

curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

Cost Tracking Inaccuracies

Issue: Cost estimates don't match billing

Solution: Cost tracking is estimated based on:

  • TTS: Character count × model rate
  • STT: Audio duration × per-minute rate
  • Agents: Conversation duration × LLM rates

For exact costs, check your ElevenLabs billing dashboard.

Examples

Complete working examples:

Learn More

Promptfoo Resources

ElevenLabs Resources