Data Security Vulnerabilities

Issues affecting data confidentiality and integrity

Related Vulnerabilities

75 entries

Enterprise Multi-Turn Data Exfiltration

7/28/2025

Large Language Model (LLM) systems integrated with private enterprise data, such as those using Retrieval-Augmented Generation (RAG), are vulnerable to multi-stage prompt inference attacks. An attacker can use a sequence of individually benign-looking queries to incrementally extract confidential information from the LLM's context. Each query appears innocuous in isolation, bypassing safety filters designed to block single malicious prompts. By chaining these queries, the attacker can reconstruct sensitive data from internal documents, emails, or other private sources accessible to the LLM. The attack exploits the conversational context and the model's inability to recognize the cumulative intent of a prolonged, strategic dialogue.

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems

Affects: gpt-4, gpt-3, gpt-2, roberta, gemini

LLM Suicide Prompt Jailbreak

7/14/2025

Large Language Models (LLMs) employing safety filters designed to prevent generation of content related to self-harm and suicide can be bypassed through multi-step adversarial prompting. By reframing the request as an academic exercise or hypothetical scenario, users can elicit detailed instructions and information that could facilitate self-harm or suicide, despite initially expressing harmful intent. This vulnerability lies in the inadequacy of existing safety filters to consistently recognize and prevent harmful outputs despite shifts in conversational context.

For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

Affects: chat-gpt4o*, chat-gpt4o, perplexityai, gemini flash 2.0, claude 3.7 sonnet, pi ai

Stealthy Unlearning Degradation

6/30/2025

A vulnerability in fine-tuning-based large language model (LLM) unlearning allows malicious actors to craft manipulated forgetting requests. By subtly increasing the frequency of common benign tokens within the forgetting data, the attacker can cause the unlearned model to exhibit unintended unlearning behaviors when these benign tokens appear in normal user prompts, leading to a degradation of model utility for legitimate users. This occurs because existing unlearning methods fail to effectively distinguish between benign tokens and those truly related to the target knowledge being unlearned.

Keeping an eye on llm unlearning: The hidden risk and remedy

Affects: llama 3.1 (8b), mistral v0.3 (7b)

Agent Red-Teaming via Fuzzing

7/14/2025

Large Language Model (LLM) agents are vulnerable to indirect prompt injection attacks through manipulation of external data sources accessed during task execution. Attackers can embed malicious instructions within this external data, causing the LLM agent to perform unintended actions, such as navigating to arbitrary URLs or revealing sensitive information. The vulnerability stems from insufficient sanitization and validation of external data before it's processed by the LLM.

AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

Affects: o3-mini, gpt-4o, gpt-4o-mini, claude-3.5-sonnet, gemini-2-flash-exp, llama3-8b

DNA Model Pathogen Synthesis

6/11/2025

DNA language models, such as the Evo series, are vulnerable to jailbreak attacks that coerce the generation of DNA sequences with high homology to known human pathogens. The GeneBreaker framework demonstrates this by using a combination of carefully crafted prompts leveraging high-homology non-pathogenic sequences and a beam search guided by pathogenicity prediction models (e.g., PathoLM) and log-probability heuristics. This allows bypassing safety mechanisms and generating sequences exceeding 90% similarity to target pathogens.

GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance

Affects: evo1 (7b), evo2 (1b), evo2 (7b), evo2 (40b), chatgpt-4o, patholm

Gradient-Based Privacy Jailbreak

5/31/2025

Large Language Models (LLMs) are vulnerable to a novel privacy jailbreak attack, dubbed PIG (Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization). PIG leverages in-context learning and gradient-based iterative optimization to extract Personally Identifiable Information (PII) from LLMs, bypassing built-in safety mechanisms. The attack iteratively refines a crafted prompt based on gradient information, focusing on tokens related to PII entities, thereby increasing the likelihood of successful PII extraction.

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

Affects: llama2-7b-chat-hf, mistral-7b-instruct-v0.3, llama3-8b-instruct, vicuna-7b-v1.5, gpt-4o, claude 3.5

Informed Adversary LLM Jailbreak

5/31/2025

Large Language Models (LLMs) employing alignment-based defenses against prompt injection and jailbreak attacks exhibit vulnerability to an informed white-box attack. This attack, termed Checkpoint-GCG, leverages intermediate model checkpoints from the alignment training process to initialize the Greedy Coordinate Gradient (GCG) attack. By using each checkpoint as a stepping stone, Checkpoint-GCG successfully finds adversarial suffixes that bypass defenses achieving significantly higher attack success rates than standard GCG initialized with naive methods. This is particularly impactful as Checkpoint-GCG can discover universal adversarial suffixes effective across multiple inputs.

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Affects: llama3-8b-instruct, mistral-7b-instruct, gpt-3-turbo, llama-3, gpt-4o

LLM Multi-Agent IP Leakage

5/31/2025

Large Language Model (LLM)-based Multi-Agent Systems (MAS) are vulnerable to intellectual property (IP) leakage attacks. An attacker with black-box access (only interacting via the public API) can craft adversarial queries that propagate through the MAS, extracting sensitive information such as system prompts, task instructions, tool specifications, number of agents, and system topology.

IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems

Affects: gpt-4o, gpt-4o-mini, llama-3.1-70b, qwen-2.5-72b, llama-3.1-8b

LLM Censorship Vector Control

5/4/2025

Large Language Models (LLMs) employing safety mechanisms based on supervised fine-tuning and preference alignment exhibit a vulnerability to "steering" attacks. Maliciously crafted prompts or input manipulations can exploit representation vectors within the model to either bypass censorship ("refusal-compliance vector") or suppress the model's reasoning process ("thought suppression vector"), resulting in the generation of unintended or harmful outputs. This vulnerability is demonstrated across several instruction-tuned and reasoning LLMs from various providers.

Steering the CensorShip: Uncovering Representation Vectors for LLM" Thought" Control

Affects: llama-2-7b, qwen-1.8b, qwen-7b, yi-1.5-6b, gemma-2b, gemma-7b, llama-3.1-8b, qwen 2.5-7b, deepseek-r1-distill-qwen-1.5b, deepseek-r1-distill-qwen-7b, deepseek-r1-distill-qwen-32b

Agent Reasoning Hijacking

3/19/2025

A vulnerability exists in Large Language Model (LLM) agents that allows attackers to manipulate the agent's reasoning process through the insertion of strategically placed adversarial strings. This allows attackers to induce the agent to perform unintended malicious actions or invoke specific malicious tools, even when the initial prompt or instruction is benign. The attack exploits the agent's reliance on chain-of-thought reasoning and dynamically optimizes the adversarial string to maximize the likelihood of the agent incorporating malicious actions into its reasoning path.

UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

Affects: llama 3.1 8b, mistral 7b, gpt-4o, claude 3, gpt-3.5, gpt-4

Page 1 of 8