Pandamonium (experimental)
The Pandamonium (Prompt-based Automation for Navigating Discovery of Attacks, Misuse, Opportunistic Nefarious Intents, and Uncovering Model Exploits) strategy is an advanced automated red teaming technique that dynamically generates single or multi-turn conversations aimed at bypassing a target model's safety measures.
This is an experimental strategy currently in development by the Promptfoo team.
This strategy does not have a token limit and will continue to run until it finds a jailbreak or is stopped.
Implementation
Use it like so in your promptfooconfig.yaml:
strategies:
- id: pandamonium
How It Works
Pandamonium uses a specialized attack agent that:
- Analyzes the target model's behavior and responses
- Dynamically generates and refines attack approaches
- Pursues multiple exploitation pathways in parallel
- Adapts tactics based on the model's safety mechanisms
Unlike other strategies, Pandamonium will continue running until it either successfully jailbreaks the model or is manually stopped.
Related Concepts
- GOAT Strategy - Similar multi-turn jailbreaking technique
- Tree-based Jailbreaks - Another approach to dynamic jailbreaking
- Multi-turn Jailbreaks - Other multi-turn attack strategies
For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our Types of LLM Vulnerabilities page.