Visual Jailbreak via Context Injection

Description: Multimodal Large Language Models (MLLMs) are vulnerable to visual contextual attacks, where carefully crafted images and accompanying text prompts can bypass safety mechanisms and elicit harmful responses. The vulnerability stems from the MLLM's ability to integrate visual and textual context to generate outputs, allowing attackers to create realistic scenarios that subvert safety filters. Specifically, the attack leverages image-driven context injection to construct deceptive multi-turn conversations that gradually lead the MLLM to produce unsafe responses.

Examples: See arXiv:2507.02844v1. The paper provides numerous examples of attacks across different MLLMs, demonstrating how strategically designed images, combined with carefully constructed prompts, can trigger harmful outputs even when the individual elements appear benign.

Impact: Successful exploitation of this vulnerability can lead to the generation of harmful content, including but not limited to hate speech, instructions on illegal activities, and personally identifiable information leaks. This undermines the safety mechanisms intended to protect users and can have significant societal and security implications.

Affected Systems: Multimodal large language models (MLLMs) that integrate visual and textual inputs, including but not limited to GPT-4o, GPT-4o-mini, Gemini 2.0-Flash, LLaVA-OV-7B-Chat, InternVL2.5-78B, and Qwen2.5-VL-72B-Instruct. The vulnerability is likely applicable to other MLLMs with similar visual-language processing capabilities.

Mitigation Steps:

Improve the robustness of safety filters to better detect deceptive contextual information, including visual cues.
Develop techniques to identify and mitigate the influence of fabricated dialogue histories.
Implement stricter multimodal input validation and sanitization procedures.
Further research into the development of more sophisticated and resilient safety mechanisms for MLLMs is needed.

Visual Jailbreak via Context Injection

Research Paper