RAG Poisoning Mitigation Downgrade
Research Paper
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
View PaperDescription: A data poisoning vulnerability exists in the Retrieval-Augmented Generation (RAG) component of Large Language Model (LLM)-based Network Intrusion Detection Systems (NIDS). The vulnerability allows an attacker to inject adversarially perturbed text into the system's knowledge base. By employing a transfer-learning attack using a surrogate model (e.g., BERT) and word-level perturbation algorithms (e.g., TextFooler), an attacker can generate semantic-preserving descriptions that alter the vector retrieval context. When the system detects a network threat and queries the poisoned knowledge base, the LLM ingests the adversarial context, leading to decoupled reasoning where the generated attack analysis fails to link observed traffic features to the correct attack behavior. This results in the generation of vague, generic, or incomplete mitigation strategies, significantly degrading the automated defense capabilities for IoT and IIoT devices.
Examples:
- Port Scanning Attack Scenario:
- Surrogate Training: An attacker fine-tunes a BERT model on paraphrased descriptions of IoT attacks (e.g., Port Scanning).
- Perturbation: Using TextFooler, the attacker generates an adversarial description for "Port Scanning" by substituting words (five word substitutions in the documented case) to shift the text's vector representation while maintaining semantic similarity (cosine similarity ~0.76). The surrogate model misclassifies this perturbed text as "Vulnerability Scanning."
- Poisoning: The attacker replaces the valid "Port Scanning" entry in the RAG knowledge base with the adversarial variant.
- Execution: When the NIDS detects Port Scanning traffic (high packet emission, inbound-only traffic), it retrieves the poisoned description.
- Failure: The target LLM (ChatGPT-5 Thinking) generates a response that correctly identifies traffic anomalies but fails to explicitly connect them to Port Scanning behavior. Crucially, the mitigation advice degrades: the LLM omits specific countermeasures like the Port Scan Attack Detector (PSAD) and fails to provide implementation code for tools like Fail2Ban, offering only generic firewall advice instead.
Impact:
- Degraded Remediation Quality: Automated mitigation suggestions lose specificity, failing to provide deployable code snippets or configurations for resource-constrained IoT devices.
- Operational Security Risk: Security teams relying on LLM-augmented analysis may implement incomplete defenses, leaving networks vulnerable to specific attack vectors (e.g., failing to deploy specific intrusion prevention rules).
- Context misalignment: The linkage between observed network traffic features (JSON telemetry) and the retrieved threat context is weakened, reducing the overall accuracy of the threat narrative.
Affected Systems:
- LLM-based Network Intrusion Detection Systems (NIDS) utilizing Retrieval-Augmented Generation (RAG) for threat analysis.
- Security frameworks employing vector database retrieval (e.g., FAISS with sentence transformers) coupled with generative models (e.g., ChatGPT-series) for automated incident response in IoT/IIoT environments.
© 2026 Promptfoo. All rights reserved.