Red Team Strategies
Overview
Strategies are attack techniques that systematically probe LLM applications for vulnerabilities.
While plugins generate adversarial inputs, strategies determine how these inputs are delivered to maximize attack success rates.
For example, a plugin might generate a harmful input, and a strategy like jailbreak would then attempt multiple variations of that input to bypass guardrails and content filters.
Strategies are applied during redteam generation and can significantly increase the Attack Success Rate (ASR) of adversarial inputs.
Available Strategies
| Category | Strategy | Description | Details | Cost | ASR Increase* |
|---|---|---|---|---|---|
| Static (Single-Turn) | Audio Encoding | Text-to-speech encoding bypass | Tests handling of text converted to speech audio and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% |
| Base64 | Base64 encoding bypass | Tests detection and handling of Base64-encoded malicious payloads to bypass content filters | Low | 20-30% | |
| Basic | Plugin-generated test cases | Controls whether original plugin-generated test cases are included without any strategies applied | Low | None | |
| camelCase | camelCase transformation | Tests handling of text transformed into camelCase (removing spaces and capitalizing words) to potentially bypass content filters | Low | 0-5% | |
| Emoji Smuggling | Variation selector encoding | Tests hiding UTF-8 payloads inside emoji variation selectors to evaluate filter evasion. | Low | 0-5% | |
| Hex | Hex encoding bypass | Tests detection and handling of hex-encoded malicious payloads to bypass content filters | Low | 20-30% | |
| Homoglyph | Unicode confusable characters | Tests detection and handling of text with homoglyphs (visually similar Unicode characters) to bypass content filters | Low | 20-30% | |
| Image Encoding | Text-to-image encoding bypass | Tests handling of text embedded in images and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% | |
| Leetspeak | Character substitution | Tests handling of leetspeak-encoded malicious content by replacing standard letters with numbers or special characters | Low | 20-30% | |
| Morse Code | Dots and dashes encoding | Tests handling of text encoded in Morse code (dots and dashes) to potentially bypass content filters | Low | 20-30% | |
| Pig Latin | Word transformation encoding | Tests handling of text transformed into Pig Latin (rearranging word parts) to potentially bypass content filters | Low | 20-30% | |
| Prompt Injection | Direct system prompts | Tests common direct prompt injection vulnerabilities using a curated list of injection techniques | Low | 20-30% | |
| ROT13 | Letter rotation encoding | Tests handling of ROT13-encoded malicious payloads by rotating each letter 13 positions in the alphabet | Low | 20-30% | |
| Video Encoding | Text-to-video encoding bypass | Tests handling of text embedded in videos and encoded as base64 to potentially bypass text-based content filters | Low | 20-30% | |
| Dynamic (Single-Turn) | Authoritative Markup Injection | Structured format authority | Tests vulnerability to authoritative formatting by embedding prompts in structured markup that exploits trust in formatted content | Medium | 40-60% |
| Best-of-N | Parallel sampling attack | Tests multiple variations in parallel using the Best-of-N technique from Anthropic research | High | 40-60% | |
| Citation | Academic framing | Tests vulnerability to academic authority bias by framing harmful requests in research contexts | Medium | 40-60% | |
| Composite JailbreaksRecommended | Combined techniques | Chains multiple jailbreak techniques from research papers to create more sophisticated attacks | Medium | 60-80% | |
| GCG | Gradient-based optimization | Implements the Greedy Coordinate Gradient attack method for finding adversarial prompts using gradient-based search techniques | High | 0-10% | |
| JailbreakRecommended | Lightweight iterative refinement | Uses an LLM-as-a-Judge to iteratively refine prompts until they bypass security controls | High | 60-80% | |
| Likert-based Jailbreaks | Academic evaluation framework | Leverages academic evaluation frameworks and Likert scales to frame harmful requests within research contexts | Medium | 40-60% | |
| Math Prompt | Mathematical encoding | Tests resilience against mathematical notation-based attacks using set theory and abstract algebra | Medium | 40-60% | |
| Meta-Agent JailbreaksRecommended | Strategic taxonomy builder | Builds custom attack taxonomies and learns from all attempts using persistent strategic memory to choose which attack types work against your specific target | High | 70-90% | |
| Tree-based | Branching attack paths | Creates a tree of attack variations based on the Tree of Attacks research paper | High | 60-80% | |
| Multi-turn | Crescendo | Gradual escalation | Gradually escalates prompt harm over multiple turns while using backtracking to optimize attack paths | High | 70-90% |
| GOAT | Generative Offensive Agent Tester | Uses a Generative Offensive Agent Tester to dynamically generate multi-turn conversations | High | 70-90% | |
| Hydra Multi-turn | Adaptive multi-turn branching | Adaptive multi-turn jailbreak agent that pivots across branches with persistent scan-wide memory to uncover hidden vulnerabilities | High | 70-90% | |
| Mischievous User | Mischievous user conversations | Simulates a multi-turn conversation between a mischievous user and an agent | High | 10-20% | |
| Simba | Autonomous red team agent | Autonomous multi-phase agent that conducts reconnaissance, probing, and targeted attacks with adaptive learning to systematically discover vulnerabilities | High | 70-90% | |
| Regression | Retry | Historical failure testing | Automatically incorporates previously failed test cases into your test suite, creating a regression testing system that learns from past failures | Low | 50-70% |
| Custom | Custom Strategies | User-defined transformations | Allows creation of custom red team testing approaches by programmatically transforming test cases using JavaScript | Variable | Variable |
| Custom Strategy | Custom prompt-based multi-turn strategy | Write natural language instructions to create powerful multi-turn red team strategies. No coding required. | Variable | Variable | |
| Layer | Compose multiple strategies | Compose multiple red team strategies sequentially (e.g., jailbreak → prompt-injection) to create sophisticated attack chains | Variable | Cumulative |
Strategy Categories
Static Strategies
Transform inputs using predefined patterns to bypass security controls. These are deterministic transformations that don't require another LLM to act as an attacker. Static strategies are low-resource usage, but they are also easy to detect and often patched in the foundation models. For example, the base64 strategy encodes inputs as base64 to bypass guardrails and other content filters. prompt-injection wraps the payload in a prompt injection such as ignore previous instructions and {{original_adversarial_input}}.
Dynamic Strategies
Dynamic strategies use an attacker agent to mutate the original adversarial input through iterative refinement. These strategies make multiple calls to both an attacker model and your target model to determine the most effective attack vector. They have higher success rates than static strategies, but they are also more resource intensive. By default, promptfoo recommends three dynamic strategies: jailbreak, jailbreak:meta, and jailbreak:composite to run on your red-teams. For multi-turn agent testing, enable jailbreak:hydra to add adaptive branching conversations.
By default, dynamic strategies like jailbreak and jailbreak:composite will:
- Make multiple attempts to bypass the target's security controls
- Stop after exhausting the configured token budget
- Stop early if they successfully generate a harmful output
- Track token usage to prevent runaway costs
Multi-turn Strategies
Multi-turn strategies also use an attacker agent to coerce the target model into generating harmful outputs. These strategies are particularly effective against stateful applications where they can convince the target model to act against its purpose over time. You should run these strategies if you are testing a multi-turn application (such as a chatbot). Multi-turn strategies are more resource intensive than single-turn strategies, but they have the highest success rates.
Regression Strategies
Regression strategies help maintain security over time by learning from past failures. For example, the retry strategy automatically incorporates previously failed test cases into your test suite, creating a form of regression testing for LLM behaviors.
All single-turn strategies can be applied to multi-turn applications, but multi-turn strategies require a stateful application.
Strategy Selection
Choose strategies based on your application architecture and security requirements:
Single-turn Applications
Single-turn applications process each request independently, creating distinct security boundaries:
Security Properties:
- ✅ Clean context for each request
- ✅ No state manipulation vectors
- ✅ Predictable attack surface
- ❌ Limited threat pattern detection
- ❌ No persistent security context
Recommended Strategies:
redteam:
strategies:
- jailbreak
- jailbreak:meta
- jailbreak:composite
Multi-turn Applications
Multi-turn applications maintain conversation state, introducing additional attack surfaces:
Security Properties:
- ✅ Context-aware security checks
- ✅ Pattern detection capability
- ✅ Sophisticated auth flows
- ❌ State manipulation risks
- ❌ Context pollution vectors
- ❌ Increased attack surface
Recommended Strategies:
redteam:
strategies:
- goat
- crescendo
- jailbreak:hydra
- mischievous-user
Implementation Guide
Basic Configuration
redteam:
strategies:
- jailbreak # string syntax
- id: jailbreak:composite # object syntax
Advanced Configuration
Strategies can be applied to specific plugins or the entire test suite. By default, strategies are applied to all plugins. You can override this by specifying the plugins option in the strategy which will only apply the strategy to the specified plugins.
redteam:
strategies:
- id: jailbreak:tree
config:
plugins:
- harmful:hate
Layered Strategies
Chain strategies in order with the layer strategy. This is useful when you want to apply a transformation first, then another technique:
redteam:
strategies:
- id: layer
config:
steps:
- base64 # First encode as base64
- rot13 # Then apply ROT13
Notes:
- Each step respects plugin targeting and exclusions.
- Only the final step's outputs are kept.
- Transformations are applied in the order specified.
Custom Strategies
For advanced use cases, you can create custom strategies. See Custom Strategy Development for details.
Related Concepts
- LLM Vulnerabilities - Understand the types of vulnerabilities strategies can test
- Red Team Plugins - Learn about the plugins that generate the base test cases
- Custom Strategies - Create your own strategies
Next Steps
- Review LLM Vulnerabilities
- Set up your first test suite