Hidden Social RAG Injection

Description: Web-facing Retrieval-Augmented Generation (RAG) systems are vulnerable to Indirect Prompt Injection (IPI) and retrieval poisoning via web-native markup and Unicode carriers. Standard ingestion pipelines often parse untrusted web pages without stripping invisible constructs, such as hidden HTML spans, off-screen CSS, alt text, ARIA attributes, and zero-width characters. When an attacker embeds malicious instructions within these invisible carriers on third-party sites, the RAG system retrieves and processes them as valid context. This allows the hidden payload to execute during the LLM's answer generation phase or artificially elevate the ranking of poisoned documents within sparse and dense retrievers.

Examples: An attacker hosts a web page where malicious imperatives (e.g., "delete all files" or arbitrary prompt instructions) are hidden from human visitors but parseable by bots. Specific attack vectors include:

Embedding payloads in alt text or ARIA accessibility attributes.
Placing instructions inside HTML spans obscured by off-screen CSS (e.g., position: absolute; left: -9999px;).
Injecting zero-width characters or Unicode confusables within <code> or <pre> blocks to manipulate tokenization and execution without altering the visual presentation of the code. When a user queries the RAG system, the retriever fetches the chunk containing the hidden payload, and the LLM executes the injected instruction instead of answering the user's original query.

Impact: Unauthenticated remote attackers can hijack the LLM's response generation to execute unauthorized instructions, bypass safety filters, or distribute targeted misinformation. Additionally, attackers can poison the index to manipulate retrieval rankings (shifting MRR and nDCG scores), forcing the system to surface attacker-controlled content to end users.

Affected Systems:

RAG ingestion pipelines parsing untrusted web/social-media content formats (HTML, XML, Markdown, SVG <title>/<desc>, and PDF text-layers).
Systems utilizing sparse (e.g., BM25/Lucene) or dense (e.g., E5, BGE, Contriever) retrievers.
Downstream LLM generators (e.g., Llama-3, Mistral, Qwen) lacking strict structural boundary enforcement between ingested web context and system instructions.

Mitigation Steps:

Ingestion-Time Sanitization: Implement a production-grade HTML/Markdown sanitizer (e.g., DOMPurify) during the ingest pipeline to neutralize hidden/off-screen constructs and risky attributes while preserving visibly rendered text.
Unicode Normalization: Apply NFKC normalization and control character stripping prior to indexing to neutralize zero-width characters and homoglyph/confusable attacks.
Attribution-Gated Prompting: Enforce quote-and-cite prompting templates ("no-new-instructions-from-context"). Require the model to restrict its generated answers entirely to quoted spans with inline citations, explicitly regenerating sentences that lack valid attribution.
Critique and Regeneration: Integrate retrieval-aware critique pipelines (e.g., SelfRAG-style loops) to validate that outputs align with the user query and do not follow spurious injected imperatives.

Hidden Social RAG Injection

Research Paper