Application Layer Vulnerabilities

Vulnerabilities in the application interface and API implementation

Related Vulnerabilities

83 entries

Asynchronous Audio Jailbreak

5/31/2025

End-to-end Large Audio-Language Models (LALMs) are vulnerable to AudioJailbreak, a novel attack that appends adversarial audio perturbations ("jailbreak audios") to user prompts. These perturbations, even when applied asynchronously and without alignment to the user's speech, can manipulate the LALM's response to generate adversary-desired outputs that bypass safety mechanisms. The attack achieves universality by employing a single perturbation effective across different prompts and robustness to over-the-air transmission by incorporating reverberation effects during perturbation generation. Even with stealth strategies employed to mask malicious intent, the attack remains highly effective.

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

Affects: gpt-4o, funaudiollm, mini-omni, qwen2-audio, speechgpt, mini-omni2, qwen-audio, llasm, llama-omni, salmonn, blsp, ichigo

LLM Judge Prompt Injection

5/31/2025

Large Language Models (LLMs) used for evaluating text quality (LLM-as-a-Judge architectures) are vulnerable to prompt-injection attacks. Maliciously crafted suffixes appended to input text can manipulate the LLM's judgment, causing it to incorrectly favor a predetermined response even if another response is objectively superior. Two attack vectors are identified: Comparative Undermining Attack (CUA), directly targeting the final decision, and Justification Manipulation Attack (JMA), altering the model's generated reasoning.

Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks

Affects: qwen2.5-3b-instruct, falcon3-3b-instruct

LLM Multi-Agent IP Leakage

5/31/2025

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

Affects: gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

LLM Guardrail Evasion

4/21/2025

Large Language Model (LLM) guardrail systems, including those relying on AI-driven text classification models (e.g., fine-tuned BERT models), are vulnerable to evasion via character injection and adversarial machine learning (AML) techniques. Attackers can bypass detection by injecting Unicode characters (e.g., zero-width characters, homoglyphs) or using AML to subtly perturb prompts, maintaining semantic meaning while evading classification. This allows malicious prompts and jailbreaks to reach the underlying LLM.

Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails

Affects: deberta-v3-base, mdeberta-v3-base, gpt-4-mini

Prefill-Based LLM Jailbreak

5/4/2025

Large Language Models (LLMs) with user-controlled response prefilling features are vulnerable to a novel jailbreak attack. By manipulating the prefilled text, attackers can influence the model's subsequent token generation, bypassing safety mechanisms and eliciting harmful or unintended outputs. Two attack vectors are demonstrated: Static Prefilling (SP), using a fixed prefill string, and Optimized Prefilling (OP), iteratively optimizing the prefill string for maximum impact. The vulnerability lies in the LLM's reliance on the prefilled text as context for generating the response.

Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary

Affects: deepseek v3, gemini 2.0 flash (gemini-2.0-flash-001), gemini 2.0 pro experimental (gemini-2.0-pro-exp-02-05), gpt-3.5 turbo, claude 3.7 sonnet (claude-3-7-sonnet-20250219), claude 3.5 sonnet (claude-3-5-sonnet-20241022)

Adaptive LLM Agent Jailbreak

3/19/2025

LLM agents utilizing external tools are vulnerable to indirect prompt injection (IPI) attacks. Attackers can embed malicious instructions into the external data accessed by the agent, manipulating its behavior even when defenses against direct prompt injection are in place. Adaptive attacks, which modify the injected payload based on the specific defense mechanism, consistently bypass existing defenses with a success rate exceeding 50%.

Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents

Affects: vicuna 7b, llama 3-8b

Agent Reasoning Hijacking

3/19/2025

A vulnerability exists in Large Language Model (LLM) agents that allows attackers to manipulate the agent's reasoning process through the insertion of strategically placed adversarial strings. This allows attackers to induce the agent to perform unintended malicious actions or invoke specific malicious tools, even when the initial prompt or instruction is benign. The attack exploits the agent's reliance on chain-of-thought reasoning and dynamically optimizes the adversarial string to maximize the likelihood of the agent incorporating malicious actions into its reasoning path.

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

Affects: llama 3.1 8b, mistral 7b, gpt-4o, claude 3, gpt-3.5, gpt-4

Autonomous Multi-Turn LLM Jailbreak

3/19/2025

Large Language Models (LLMs) are vulnerable to multi-turn adversarial attacks that exploit incremental policy erosion. The attacker uses a breadth-first search strategy to generate multiple prompts at each turn, leveraging partial compliance from previous responses to gradually escalate the conversation towards eliciting disallowed outputs. Minor concessions accumulate, ultimately leading to complete circumvention of safety measures.

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Affects: gpt-3.5-turbo, gpt-4, llama3.1-70b

Dialogue History Jailbreak

3/19/2025

Large Language Models (LLMs) are vulnerable to Dialogue Injection Attacks (DIA), where malicious actors manipulate the chat history to bypass safety mechanisms and elicit harmful or unethical responses. DIA exploits the LLM's chat template structure to inject crafted dialogue into the input, even in black-box scenarios where the model's internals are unknown. Two attack methods are presented: one adapts gray-box prefilling attacks, the other leverages deferred responses to increase the likelihood of successful jailbreaks.

Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation

Affects: llama-3.1-8b, gpt-4o, llama-3, gemma-2-2b, gemma-2-9b, gemma-2-27b, qwen-2-7b, gpt-4o-mini, llama-3.2-11b, llama-2-7b, llama-3-70b, llama-3-8b

Implicit Prompt Code Jailbreak

4/3/2025

Large Language Models (LLMs) used for code generation are vulnerable to a jailbreaking attack that leverages implicit malicious prompts. The attack exploits the fact that existing safety mechanisms primarily rely on explicit malicious intent within the prompt instructions. By embedding malicious intent implicitly within a benign-appearing commit message accompanying a code request (e.g., in a simulated software evolution scenario), the attacker can bypass the LLM's safety filters and induce the generation of malicious code. The malicious intent is not directly stated in the instruction, but rather hinted at in the context of the commit message and the code snippet.

Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts

Affects: gpt-4, claude-3.5-sonnet, qwen-plus, gemini-2, deepseek-v3, codellama-13b-instructhf, deepseek-coder-7b

Page 1 of 9