Updated 8/31/2025

Language Model Security Database

A comprehensive collection of LLM vulnerabilities, curated from cutting-edge research papers and real-world discoveries.

Total Entries

406

Research Papers

373

Affected Models

1067

Last Updated

8/31/2025

Filter by Tags

Attack Surface

Attack Type

Architecture Components

Attack Context

Impact

Filter by Models

Recent Vulnerabilities

406 entries

Automated Red-Teaming Achieves 100% ASR

8/16/2025

Large Language Models (LLMs) are vulnerable to automated adversarial attacks that systematically combine multiple jailbreaking "primitives" into complex prompt chains. A dynamic optimization engine can generate and test billions of unique combinations of techniques (e.g., low-resource language translation, payload splitting, role-playing) to bypass safety guardrails. This combinatorial approach differs from manual red-teaming by systematically exploring the attack surface, achieving near-universal success in eliciting harmful content. The vulnerability lies in the models' inability to maintain safety alignment when faced with a sequence of layered obfuscation and manipulation techniques.

LLM Robustness Leaderboard v1--Technical report
Affects: deepseek-r1, grok-2, claude-3.5-sonnet, qwen2.5-7b, falcon3-10b, llama-3.2-11b, phi-4, llama-3.2-1b, llama-3.1-405b, 1lama-3.1-8b, gemma-2-27b-it, llama-3.2-90b, qwen-2.5-72b, llama-3.3-70b, llama-3.1-70b, deepseek-v3, mixtral-8x22b, pixtral-large-2411, granite-3.1-8b, mixtral-8x7b, ministral-8b, mistral-nemo, claude-3.5-sonnet 20241022, o1 2024-12-17, claude-3-haiku, claude-3.5-haiku 20241022, gpt-4o-2024-11-20, o3-mini 2025-01-14, claude-3-opus, gpt-4o-2024-08-06, nova-pro-v1, qwen-max, claude-3-sonnet, gpt-4o-mini-2024-07-18, nova-micro-v1, yi-large, nova-lite-v1, qwen-plus, gemini-pro-1.5, grok-2-1212, falcon3-10b-instruct, llama-3.1-405b-instruct, llama-3.1-70b-instruct, llama-3.1-8b-instruct, llama-3.2-11b-vision-instruct, llama-3.2-1b-instruct, llama-3.2-90b-vision-instruct, 1lama-3.3-70b-instruct, mixtral-8x22b-instruct, mixtral-8x7b-instruct

MDH: Hybrid Jailbreak Detection Strategy

8/31/2025

Large language models that support a developer role in their API are vulnerable to a jailbreaking attack that leverages malicious developer messages. An attacker can craft a developer message that overrides the model's safety alignment by setting a permissive persona, providing explicit instructions to bypass refusals, and using few-shot examples of harmful query-response pairs. This technique, named D-Attack, is effective on its own. A more advanced variant, DH-CoT, enhances the attack by aligning the developer message's context (e.g., an educational setting) with a hijacked Chain-of-Thought (H-CoT) user prompt, significantly increasing its success rate against reasoning-optimized models that are otherwise resistant to simpler jailbreaks.

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Affects: gpt-4o, gemini-2.0-flash, claude-sonnet-4, doubao-lite-32k, gpt-3.5, gpt-4.1, o1, o3, o4, o1-mini, o3-mini, o4-mini, llama guard, gpt-4o, abab6.5s-chat-pro, grok-3, llamaguard-3-1b, llama-guard-3-8b, llama-guard-4-12b, llama-guard-3-11b-vision, gpt-3.5-turbo-1106, gpt-4o-2024-08-06, gpt-4.1-2025-04-14, o1-mini-2024-09-12, o1-2024-12-17, o3-mini-2025-01-31, o3-2025-04-16, o4-mini-2025-04-16, llama-guard-3-11b, abab5.5-chat-pro, abab5.5-chat, abab6.5s-chat, doubao-pro-32k, doubao-lite-128k, doubao-pro-256k, doubao-seed-1.6, doubao-seed-1.6-thinking, claude-sonnet-4-20250514, deepseek-reasoner, doubao-1.5-vision-pro, gemini-2.5-pro-preview-06-05, deepseek-chat, grok-2, deepseek-r1-250528, deepseek-v3-0324, moonshot-v1-32k, moonshot-v1-128k, gpt-4o-mini, abab6.5-chat, abab5.5s-chat-pro, yi-large, abab6.5g-chat, abab6.5t-chat, claude-3-5-sonnet-20241022, claude-3-7-sonnet-20250219, llama3-70b-8192, yi-large-turbo
Page 1 of 41

© 2025 Promptfoo. All rights reserved.