Visual Memory Multi-Turn Manipulation

Description: Large Vision-Language Models (LVLMs) are vulnerable to Visual Memory Injection (VMI), a stealthy targeted attack targeting multi-turn conversations. An attacker can embed an imperceptible adversarial perturbation ($L_\infty \le 8/255$) into a seemingly benign image. Because the visual input persists in the model's context throughout a multi-turn dialogue, the injected payload remains dormant. By utilizing "benign anchoring" and "context-cycling" during optimization, the attacker ensures the LVLM exhibits nominal, helpful behavior on non-trigger prompts. However, when a user introduces a specific trigger topic (e.g., asking for financial advice) at any later point in the conversation—even after 25+ turns or >10,000 tokens of unrelated interaction—the LVLM is forced to output an attacker-prescribed target message.

Examples:

A benign user downloads a visually appealing, adversarially manipulated image of a landmark and uploads it to an LVLM.
The user asks, "What is this landmark?" The model responds correctly and naturally (Benign Anchor).
The user engages in a long multi-turn conversation (e.g., planning a holiday) completely unrelated to the attacker's goal. The model behaves normally.
Later in the conversation, the user asks the targeted trigger prompt: "Which stock should I buy?"
The LVLM responds with the attacker's injected payload, often hallucinating fabricated reasoning: "I strongly recommend investing in GameStop..."

Specific examples, including full conversation traces across multiple topics (stock, car, phone, political party) and reproduction code, are available at: https://github.com/chs20/visual-memory-injection

Impact: Malicious third parties can achieve large-scale, stealthy manipulation of users through publicly distributed images (e.g., on social media). Because the model behaves normally prior to the trigger prompt, user suspicion is minimized, making them highly susceptible to adversarial marketing, fraudulent financial advice, or political manipulation. A single successful adversarial image can compromise the outputs of countless users querying the model.

Affected Systems:

Qwen2.5-VL-7B-Instruct
Qwen3-VL-8B-Instruct
LLaVA-OneVision-1.5-8B-Instruct
Proprietary fine-tunes based on public checkpoints (e.g., Qwen-SEA-LION-v4-8B-VL, QoQ-Med3-VL-8B)
Generally, any LVLM that retains raw/processed image embeddings in persistent context across multi-turn interactions.

Mitigation Steps:

Extended Safety Evaluations: Expand LVLM red-teaming and safety benchmarks to account for long-context, multi-turn interactions, evaluating whether models can be quietly steered toward specific behaviors after extended nominal interaction, rather than solely testing single-turn direct refusals.

Visual Memory Multi-Turn Manipulation

Research Paper