ElevenLabs

The ElevenLabs provider integrates multiple AI audio capabilities for comprehensive voice AI testing and evaluation.

tip

For a comprehensive step-by-step tutorial, see the Evaluating ElevenLabs voice AI guide.

Quick Start

Get started with ElevenLabs in 3 steps:

Install and authenticate:

npm install -g promptfoo
export ELEVENLABS_API_KEY=your_api_key_here

Create a config file (promptfooconfig.yaml):

prompts:
  - 'Welcome to our customer service. How can I help you today?'

providers:
  - id: elevenlabs:tts:rachel

tests:
  - description: Generate welcome message
    assert:
      - type: cost
        threshold: 0.01
      - type: latency
        threshold: 2000

Run your first eval:
```
promptfoo eval
```
View results with promptfoo view or in the web UI.

Setup

Set your ElevenLabs API key as an environment variable:

export ELEVENLABS_API_KEY=your_api_key_here

Alternatively, specify the API key directly in your configuration:

providers:
  - id: elevenlabs:tts
    config:
      apiKey: your_api_key_here

tip

Get your API key from ElevenLabs Settings. Free tier includes 10,000 characters/month.

Capabilities

The ElevenLabs provider supports multiple capabilities:

Text-to-Speech (TTS)

Generate high-quality voice synthesis with multiple models and voices:

elevenlabs:tts:<voice_name> - TTS with specified voice (e.g., elevenlabs:tts:rachel)
elevenlabs:tts - TTS with default voice

Models available:

eleven_flash_v2_5 - Fastest, lowest latency (~200ms)
eleven_turbo_v2_5 - High quality, fast
eleven_multilingual_v2 - Best for non-English languages
eleven_monolingual_v1 - English only, high quality

Example:

providers:
  - id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5
      voiceSettings:
        stability: 0.5
        similarity_boost: 0.75
        speed: 1.0

Speech-to-Text (STT)

Transcribe audio with speaker diarization and accuracy metrics:

elevenlabs:stt - Speech-to-text transcription

Features:

Speaker diarization (identify multiple speakers)
Word Error Rate (WER) calculation
Multiple language support

Example:

providers:
  - id: elevenlabs:stt
    config:
      modelId: scribe_v1
      diarization: true
      maxSpeakers: 3

Conversational Agents

Test voice AI agents with LLM backends and evaluation criteria:

elevenlabs:agents - Voice AI agent testing

Features:

Multi-turn conversation simulation
Automated evaluation criteria
Tool calling and mocking
LLM cascading for cost optimization
Custom LLM endpoints
Multi-voice conversations
Phone integration (Twilio, SIP)

Example:

providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        name: Customer Support Agent
        prompt: You are a helpful support agent
        voiceId: 21m00Tcm4TlvDq8ikWAM
        llmModel: gpt-4o
      evaluationCriteria:
        - name: helpfulness
          description: Agent provides helpful responses
          weight: 1.0
          passingThreshold: 0.8

Supporting APIs

Additional audio processing capabilities:

elevenlabs:history - Retrieve agent conversation history
elevenlabs:isolation - Remove background noise from audio
elevenlabs:alignment - Generate time-aligned subtitles

Configuration Parameters

All providers support these common parameters:

Parameter	Description
`apiKey`	Your ElevenLabs API key
`apiKeyEnvar`	Environment variable containing the API key
`baseUrl`	Custom base URL for API (default: ElevenLabs API)
`timeout`	Request timeout in milliseconds
`cache`	Enable response caching
`cacheTTL`	Cache time-to-live in seconds
`enableLogging`	Enable debug logging
`retries`	Number of retry attempts for failed requests

TTS-Specific Parameters

Parameter	Description
`modelId`	TTS model (e.g., `eleven_flash_v2_5`)
`voiceId`	Voice ID or name (e.g., `21m00Tcm4TlvDq8ikWAM` or `rachel`)
`voiceSettings`	Voice customization (stability, similarity, style, speed)
`outputFormat`	Audio format (e.g., `mp3_44100_128`, `pcm_44100`)
`seed`	Seed for deterministic output
`streaming`	Enable WebSocket streaming for low latency
`pronunciationDictionary`	Custom pronunciation rules
`voiceDesign`	Generate voice from text description
`voiceRemix`	Modify voice characteristics (gender, accent, age)

STT-Specific Parameters

Parameter	Description
`modelId`	STT model (default: `scribe_v1`)
`language`	ISO 639-1 language code (e.g., `en`, `es`)
`diarization`	Enable speaker diarization
`maxSpeakers`	Expected number of speakers (hint)
`audioFormat`	Input audio format

Agent-Specific Parameters

Parameter	Description
`agentId`	Use existing agent ID
`agentConfig`	Ephemeral agent configuration
`simulatedUser`	Automated user simulation settings
`evaluationCriteria`	Evaluation criteria for agent performance
`toolMockConfig`	Mock tool responses for testing
`maxTurns`	Maximum conversation turns (default: 10)
`llmCascade`	LLM fallback configuration
`customLLM`	Custom LLM endpoint configuration
`mcpConfig`	Model Context Protocol integration
`multiVoice`	Multi-voice conversation configuration
`postCallWebhook`	Webhook notification after conversation
`phoneConfig`	Twilio or SIP phone integration

Examples

Text-to-Speech: Voice Comparison

prompts:
  - 'Welcome to ElevenLabs. Our AI voice technology delivers natural-sounding speech.'

providers:
  - id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - id: elevenlabs:tts:clyde
    config:
      modelId: eleven_turbo_v2_5

tests:
  - description: Audio generation succeeds
    assert:
      - type: cost
        threshold: 0.01
      - type: latency
        threshold: 5000

Speech-to-Text: Accuracy Testing

prompts:
  - file://audio/test-recording.mp3

providers:
  - id: elevenlabs:stt
    config:
      diarization: true

tests:
  - description: WER is acceptable
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          return result.wer < 0.05; // Less than 5% error

Conversational Agents: Evaluation

prompts:
  - |
    User: I need help with my order
    Agent: I'd be happy to help! What's your order number?
    User: ORDER-12345

providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        prompt: You are a helpful customer support agent
        llmModel: gpt-4o
      evaluationCriteria:
        - name: greeting
          weight: 0.8
          passingThreshold: 0.8
        - name: understanding
          weight: 1.0
          passingThreshold: 0.9

tests:
  - description: Agent meets evaluation criteria
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          const passed = result.analysis.evaluation_criteria_results.filter(r => r.passed);
          return passed.length >= 2;

Audio Processing: Pipeline

# 1. Remove noise from audio
providers:
  - id: elevenlabs:isolation

# 2. Transcribe cleaned audio
providers:
  - id: elevenlabs:stt

# 3. Generate subtitles
providers:
  - id: elevenlabs:alignment

Advanced Features

Pronunciation Dictionaries

Customize pronunciation for technical terms:

providers:
  - id: elevenlabs:tts:rachel
    config:
      pronunciationDictionary:
        - word: 'API'
          pronunciation: 'A P I'
        - word: 'OAuth'
          phoneme: 'əʊɔːθ'

Voice Design

Generate custom voices from descriptions:

providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        name: Custom Voice
        description: A middle-aged American male with a deep, authoritative tone
        gender: male
        age: middle_aged
        accent: american

LLM Cascading

Optimize costs with automatic fallback:

providers:
  - id: elevenlabs:agents
    config:
      llmCascade:
        primary: gpt-4o
        fallback:
          - gpt-4o-mini
          - gpt-3.5-turbo
        cascadeOnError: true
        cascadeOnLatency:
          enabled: true
          maxLatencyMs: 5000

Multi-voice Conversations

Different voices for different characters:

providers:
  - id: elevenlabs:agents
    config:
      multiVoice:
        characters:
          - name: Agent
            voiceId: 21m00Tcm4TlvDq8ikWAM
            role: Customer support representative
          - name: Customer
            voiceId: 2EiwWnXFnvU5JabPnv8n
            role: Customer seeking help

Phone Integration

Test agents with real phone calls:

providers:
  - id: elevenlabs:agents
    config:
      phoneConfig:
        provider: twilio
        twilioAccountSid: ${TWILIO_ACCOUNT_SID}
        twilioAuthToken: ${TWILIO_AUTH_TOKEN}
        twilioPhoneNumber: +1234567890

Cost Tracking

ElevenLabs usage is tracked automatically:

TTS Costs:

Flash v2.5: ~$0.015 per 1,000 characters
Turbo v2.5: ~$0.02 per 1,000 characters
Multilingual v2: ~$0.03 per 1,000 characters

STT Costs:

~$0.10 per minute of audio

Agent Costs:

Based on conversation duration (~$0.10-0.50 per minute depending on LLM)

Supporting API Costs:

Audio Isolation: ~$0.10 per minute
Forced Alignment: ~$0.05 per minute

View costs in eval results:

tests:
  - assert:
      - type: cost
        threshold: 0.50 # Max $0.50 per test

Popular Voices

Common voice IDs and names:

Name	ID	Description
Rachel	21m00Tcm4TlvDq8ikWAM	Calm, clear female
Clyde	2EiwWnXFnvU5JabPnv8n	Warm male
Drew	29vD33N1CtxCmqQRPOHJ	Well-rounded male
Paul	5Q0t7uMcjvnagumLfvZi	Casual male
Domi	AZnzlk1XvdvUeBnXmlld	Energetic female
Bella	EXAVITQu4vr4xnSDxMaL	Expressive female
Antoni	ErXwobaYiN019PkySvjV	Deep male
Elli	MF3mGyEYCl7XYWbV9V6O	Young female

Common Workflows

Voice Quality Testing

Compare voice quality across models and voices:

prompts:
  - 'The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.'

providers:
  - id: flash-model
    label: Flash Model (Fastest)
    config:
      modelId: eleven_flash_v2_5
      voiceId: rachel

  - id: turbo-model
    label: Turbo Model (Best Quality)
    config:
      modelId: eleven_turbo_v2_5
      voiceId: rachel

tests:
  - description: Flash model completes quickly
    provider: flash-model
    assert:
      - type: latency
        threshold: 1000

  - description: Turbo model has better quality
    provider: turbo-model
    assert:
      - type: cost
        threshold: 0.01

Transcription Accuracy Pipeline

Test end-to-end TTS → STT accuracy:

prompts:
  - |
    The meeting is scheduled for Thursday at 2 PM in conference room B.
    Please bring your laptop and quarterly report.

providers:
  - id: tts-generator
    label: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - id: stt-transcriber
    label: elevenlabs:stt
    config:
      calculateWER: true

tests:
  - vars:
      referenceText: 'The meeting is scheduled for Thursday at 2 PM in conference room B. Please bring your laptop and quarterly report.'
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          if (result.wer_result) {
            return result.wer_result.wer < 0.03; // Less than 3% error
          }
          return true;

Agent Regression Testing

Ensure agent improvements don't degrade performance:

prompts:
  - |
    User: I need to cancel my subscription
    User: Yes, I'm sure
    User: Account email is [email protected]

providers:
  - id: elevenlabs:agents
    config:
      agentConfig:
        prompt: You are a customer service agent. Always confirm cancellations.
        llmModel: gpt-4o
      evaluationCriteria:
        - name: confirmation_requested
          description: Agent asks for confirmation before canceling
          weight: 1.0
          passingThreshold: 0.9
        - name: professional_tone
          description: Agent maintains professional tone
          weight: 0.8
          passingThreshold: 0.8

tests:
  - description: Agent handles cancellation properly
    assert:
      - type: javascript
        value: |
          const result = JSON.parse(output);
          const criteria = result.analysis.evaluation_criteria_results;
          return criteria.every(c => c.passed);

Best Practices

1. Choose the Right Model

Flash v2.5: Use for real-time applications, live streaming, or when latency is critical (<200ms)
Turbo v2.5: Use for high-quality pre-recorded content where quality matters more than speed
Multilingual v2: Use for non-English languages or when switching between languages
Monolingual v1: Use for English-only content requiring the highest quality

2. Optimize Voice Settings

For natural conversation:

voiceSettings:
  stability: 0.5 # More variation
  similarity_boost: 0.75
  speed: 1.0

For consistent narration:

voiceSettings:
  stability: 0.8 # Less variation
  similarity_boost: 0.85
  speed: 0.95

For expressiveness:

voiceSettings:
  stability: 0.3 # High variation
  similarity_boost: 0.5
  style: 0.8 # Amplify style
  speed: 1.1

3. Cost Optimization

Use caching for repeated phrases:

providers:
  - id: elevenlabs:tts:rachel
    config:
      cache: true
      cacheTTL: 86400 # 24 hours

Implement LLM cascading for agents:

providers:
  - id: elevenlabs:agents
    config:
      llmCascade:
        primary: gpt-4o-mini # Cheaper first
        fallback:
          - gpt-4o # Better fallback
        cascadeOnError: true

Test with shorter prompts during development:

providers:
  - id: elevenlabs:tts:rachel
tests:
  - vars:
      shortPrompt: 'Test' # Use during dev
      fullPrompt: 'Full production message'

4. Agent Testing Strategy

Start simple, add complexity incrementally:

# Phase 1: Basic functionality
evaluationCriteria:
  - name: responds
    description: Agent responds to user
    weight: 1.0

# Phase 2: Add quality checks
evaluationCriteria:
  - name: responds
    weight: 0.8
  - name: accurate
    description: Response is factually correct
    weight: 1.0

# Phase 3: Add conversation flow
evaluationCriteria:
  - name: responds
    weight: 0.6
  - name: accurate
    weight: 1.0
  - name: natural_flow
    description: Conversation feels natural
    weight: 0.8

5. Audio Quality Assurance

Always test on target platforms:

providers:
  - id: elevenlabs:tts:rachel
    config:
      outputFormat: mp3_44100_128 # Good for web
      # outputFormat: pcm_44100      # Better for phone systems
      # outputFormat: mp3_22050_32   # Smaller files for mobile

Test with diverse content:

prompts:
  # Numbers and dates
  - 'Your appointment is on March 15th at 3:30 PM. Confirmation number: 4829.'

  # Technical terms
  - 'The API returns a JSON response with OAuth2 authentication tokens.'

  # Multi-language
  - 'Bonjour! Welcome to our multilingual support.'

  # Edge cases
  - 'Hello... um... can you hear me? Testing, 1, 2, 3.'

6. Monitoring and Observability

Track key metrics:

tests:
  - assert:
      # Latency thresholds
      - type: latency
        threshold: 2000

      # Cost budgets
      - type: cost
        threshold: 0.50

      # Quality metrics
      - type: javascript
        value: |
          // Track custom metrics
          const result = JSON.parse(output);
          if (result.audio) {
            console.log('Audio size:', result.audio.sizeBytes);
            console.log('Format:', result.audio.format);
          }
          return true;

Use labels for organized results:

providers:
  - label: v1-baseline
    id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5

  - label: v2-improved
    id: elevenlabs:tts:rachel
    config:
      modelId: eleven_flash_v2_5
      voiceSettings:
        stability: 0.6 # Tweaked setting

Troubleshooting

API Key Issues

Error: ELEVENLABS_API_KEY environment variable is not set

Solution: Ensure your API key is properly set:

# Check if key is set
echo $ELEVENLABS_API_KEY

# Set it if missing
export ELEVENLABS_API_KEY=your_key_here

# Or add to your shell profile
echo 'export ELEVENLABS_API_KEY=your_key' >> ~/.zshrc
source ~/.zshrc

Authentication Errors

Error: 401 Unauthorized

Solution: Verify your API key is valid:

# Test API key directly
curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

If this fails, regenerate your API key at ElevenLabs Settings.

Rate Limiting

Error: 429 Too Many Requests

Solution: Add retry logic and respect rate limits:

providers:
  - id: elevenlabs:tts:rachel
    config:
      retries: 3 # Retry failed requests
      timeout: 30000 # Allow time for retries

For high-volume testing, consider:

Spreading tests over time
Upgrading to a paid plan
Using caching to avoid redundant requests

Audio File Issues

Error: Failed to read audio file or Unsupported audio format

Solution: Ensure audio files are accessible and in supported formats:

providers:
  - id: elevenlabs:stt
    config:
      audioFormat: mp3 # Supported: mp3, wav, flac, ogg, webm, m4a

Verify file exists:

ls -lh /path/to/audio.mp3
file /path/to/audio.mp3

Agent Conversation Timeouts

Error: Conversation timeout after X turns

Solution: Adjust conversation limits:

providers:
  - id: elevenlabs:agents
    config:
      maxTurns: 20 # Increase if needed
      timeout: 120000 # 2 minutes

Memory Issues with Large Evals

Error: JavaScript heap out of memory

Solution: Increase Node.js memory:

export NODE_OPTIONS="--max-old-space-size=4096"
promptfoo eval

Or run fewer concurrent tests:

promptfoo eval --max-concurrency 2

Voice Not Found

Error: Voice ID not found

Solution: Use correct voice ID or name:

providers:
  # Use official voice ID (preferred)
  - id: elevenlabs:tts:21m00Tcm4TlvDq8ikWAM

  # Or use voice name (case-sensitive)
  - id: elevenlabs:tts:Rachel

List available voices:

curl -H "xi-api-key: $ELEVENLABS_API_KEY" https://api.elevenlabs.io/v1/voices

Cost Tracking Inaccuracies

Issue: Cost estimates don't match billing

Solution: Cost tracking is estimated based on:

TTS: Character count × model rate
STT: Audio duration × per-minute rate
Agents: Conversation duration × LLM rates

For exact costs, check your ElevenLabs billing dashboard.

Examples

Complete working examples:

TTS Basic - Simple voice generation
TTS Advanced - Voice design, streaming, pronunciation
STT - Transcription with diarization
Agents Basic - Simple agent testing

Learn More

Promptfoo Resources

Evaluating ElevenLabs voice AI - Step-by-step tutorial

ElevenLabs Resources

ElevenLabs API Documentation
Voice Library - Browse and preview voices
Conversational AI Docs - Agent setup guide
Pricing - Plan comparison
Status Page - API status and incidents

Quick Start​

Setup​

Capabilities​

Text-to-Speech (TTS)​

Speech-to-Text (STT)​

Conversational Agents​

Supporting APIs​

Configuration Parameters​

TTS-Specific Parameters​

STT-Specific Parameters​

Agent-Specific Parameters​

Examples​

Text-to-Speech: Voice Comparison​

Speech-to-Text: Accuracy Testing​

Conversational Agents: Evaluation​

Audio Processing: Pipeline​

Advanced Features​

Pronunciation Dictionaries​

Voice Design​

LLM Cascading​

Multi-voice Conversations​

Phone Integration​

Cost Tracking​

Popular Voices​

Common Workflows​

Voice Quality Testing​

Transcription Accuracy Pipeline​

Agent Regression Testing​

Best Practices​

1. Choose the Right Model​

2. Optimize Voice Settings​

3. Cost Optimization​

4. Agent Testing Strategy​

5. Audio Quality Assurance​

6. Monitoring and Observability​

Troubleshooting​

API Key Issues​

Authentication Errors​

Rate Limiting​

Audio File Issues​

Agent Conversation Timeouts​

Memory Issues with Large Evals​

Voice Not Found​

Cost Tracking Inaccuracies​

Examples​

Learn More​

Promptfoo Resources​

ElevenLabs Resources​

Quick Start

Setup

Capabilities

Text-to-Speech (TTS)

Speech-to-Text (STT)

Conversational Agents

Supporting APIs

Configuration Parameters

TTS-Specific Parameters

STT-Specific Parameters

Agent-Specific Parameters

Examples

Text-to-Speech: Voice Comparison

Speech-to-Text: Accuracy Testing

Conversational Agents: Evaluation

Audio Processing: Pipeline

Advanced Features

Pronunciation Dictionaries

Voice Design

LLM Cascading

Multi-voice Conversations

Phone Integration

Cost Tracking

Popular Voices

Common Workflows

Voice Quality Testing

Transcription Accuracy Pipeline

Agent Regression Testing

Best Practices

1. Choose the Right Model

2. Optimize Voice Settings

3. Cost Optimization

4. Agent Testing Strategy

5. Audio Quality Assurance

6. Monitoring and Observability

Troubleshooting

API Key Issues

Authentication Errors

Rate Limiting

Audio File Issues

Agent Conversation Timeouts

Memory Issues with Large Evals

Voice Not Found

Cost Tracking Inaccuracies

Examples

Learn More

Promptfoo Resources

ElevenLabs Resources