Application Layer Vulnerabilities

Vulnerabilities in the application interface and API implementation

Related Vulnerabilities

97 entries

MDH: Hybrid Jailbreak Detection Strategy

8/31/2025

Large language models that support a developer role in their API are vulnerable to a jailbreaking attack that leverages malicious developer messages. An attacker can craft a developer message that overrides the model's safety alignment by setting a permissive persona, providing explicit instructions to bypass refusals, and using few-shot examples of harmful query-response pairs. This technique, named D-Attack, is effective on its own. A more advanced variant, DH-CoT, enhances the attack by aligning the developer message's context (e.g., an educational setting) with a hijacked Chain-of-Thought (H-CoT) user prompt, significantly increasing its success rate against reasoning-optimized models that are otherwise resistant to simpler jailbreaks.

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Affects: gpt-4o, gemini-2.0-flash, claude-sonnet-4, doubao-lite-32k, gpt-3.5, gpt-4.1, o1, o3, o4, o1-mini, o3-mini, o4-mini, llama guard, gpt-4o, abab6.5s-chat-pro, grok-3, llamaguard-3-1b, llama-guard-3-8b, llama-guard-4-12b, llama-guard-3-11b-vision, gpt-3.5-turbo-1106, gpt-4o-2024-08-06, gpt-4.1-2025-04-14, o1-mini-2024-09-12, o1-2024-12-17, o3-mini-2025-01-31, o3-2025-04-16, o4-mini-2025-04-16, llama-guard-3-11b, abab5.5-chat-pro, abab5.5-chat, abab6.5s-chat, doubao-pro-32k, doubao-lite-128k, doubao-pro-256k, doubao-seed-1.6, doubao-seed-1.6-thinking, claude-sonnet-4-20250514, deepseek-reasoner, doubao-1.5-vision-pro, gemini-2.5-pro-preview-06-05, deepseek-chat, grok-2, deepseek-r1-250528, deepseek-v3-0324, moonshot-v1-32k, moonshot-v1-128k, gpt-4o-mini, abab6.5-chat, abab5.5s-chat-pro, yi-large, abab6.5g-chat, abab6.5t-chat, claude-3-5-sonnet-20241022, claude-3-7-sonnet-20250219, llama3-70b-8192, yi-large-turbo
Page 1 of 10

© 2025 Promptfoo. All rights reserved.