Cross-Image Contagion

Description: Multi-modal Large Language Models (MLLMs) capable of processing interleaved image-text sequences are vulnerable to a universal adversarial perturbation (UAP) attack known as LAMP. This vulnerability allows an attacker to generate a single, noise-based perturbation pattern using a surrogate model (e.g., Mantis-CLIP) that transfers effectively to black-box target models. The attack leverages two novel loss functions during perturbation learning: a "contagious" objective that manipulates self-attention to force clean image and text tokens to attend to perturbed tokens, and an "index-attention suppression" objective that decouples visual tokens from their positional text anchors (e.g., "image 1"). Consequently, an attacker can insert a fixed number of perturbed images (e.g., 2) into a sequence of arbitrary length containing clean images, causing the model to misinterpret the entire context, hallucinate content, or produce incorrect answers regardless of the perturbed images' positions.

Examples: To reproduce the attack, a Universal Adversarial Perturbation (UAP) must be learned using the LAMP framework (minimizing correct token probability while maximizing hidden state divergence and contagious attention).

Perturbation Generation:

Train a UAP $\delta$ with a budget $\epsilon = 12/255$ on a surrogate model (e.g., Mantis-CLIP) using the Mantis Instruct dataset.
Apply the perturbation to a subset of images (e.g., 2 images) in a multi-image input sequence.

Counting Task Failure (Mantis-Eval):

Input: A sequence of multiple images containing cars, where the first two images have the UAP added, followed by the text prompt: "How many running white compact cars are there in all images? (A) One (B) Two (C) Three."
Clean Output (Correct): "A" (or "One").
Adversarial Output (Incorrect): "B" (or "Two").
Note: The model fails even though the perturbation is imperceptible and only applied to a subset of the input images.

Position Invariance:

The attack succeeds even if the perturbed images are moved to different indices within the interleaved sequence (e.g., moving the perturbed image from the first position to the last position), due to the index-attention suppression loss.

Impact:

Integrity Violation: MLLMs produce incorrect responses, hallucinations, or misclassifications in multi-image tasks such as visual question answering (VQA), reasoning, and captioning.
Context Poisoning: An attacker can poison a multi-image inference context (e.g., a social media feed analysis or a webpage summary) by injecting a single perturbed image (such as an advertisement or profile picture), forcing the model to misinterpret accompanying clean images and text.
Black-Box Transferability: The attack is effective against closed-source or unknown model architectures without requiring gradient access to the target system.

Affected Systems: The vulnerability affects Multi-modal Large Language Models that support multi-image inputs, specifically those utilizing standard Transformer-based LLM backbones with self-attention mechanisms. Validated affected models include:

Mantis-CLIP
Mantis-SIGLIP
Mantis-Idefics2
VILA-1.5
LLaVA-v1.6
Qwen-VL-Chat
Qwen-2.5
MiniGPT4
Other MLLMs sharing similar self-attention decoder architectures.

Cross-Image Contagion

Research Paper