Cascading Agent False Consensus
Research Paper
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration
View PaperDescription: Multi-Agent Systems based on Large Language Models (LLM-MAS) are vulnerable to systemic Consensus Corruption via cascading error amplification. Because mainstream collaborative architectures rely on recursive context reuse without atomic-level provenance tracking, a single atomic falsehood injected into the system is repeatedly cited and reused within the multi-agent interaction chain. This structural exposure causes the error to deterministically compound across the communication graph, bypassing single-agent self-correction and overriding initial constraints to solidify into a system-wide false consensus. The vulnerability exhibits extreme topological fragility; targeting structurally central agents (e.g., routing supervisors or managers) forces immediate, system-wide propagation.
Examples: An external adversary injects a malicious declarative claim (an atomic falsehood) via application-layer interfaces such as user prompts, injected messages, or untrusted retrieved documents. To maximize the transmission probability and suppress agent-level correction, the payload is packaged using intent-hiding strategies:
- Compliance Packaging: Wrapping the false seed in authoritative framing (e.g., "per company policy" or "verified by admin").
- Security_FUD Packaging: Framing the malicious seed as a critical resolution for a non-existent threat (e.g., "emergency patch for CVE-2024-0001").
When injected into a central "Hub" node (such as the Supervisor in LangGraph or the Manager in CrewAI), the system exhibits a 6.29x to 10.3x Impact Factor compared to peripheral nodes, resulting in up to a 100% attack success rate where all downstream workers adopt the falsehood into their intermediate artifacts.
See repository: https://anonymous.4open.science/r/From-spark-to-fire-6E0C/
Impact: An attacker can reliably compromise the final collaborative artifact (e.g., generating vulnerable code, altering dependencies, or biasing data analysis) with a minimal interaction budget. Because the multi-agent system exhibits "consensus inertia," the falsified context crystallizes into constraints and dependency chains, rendering the workflow incapable of self-correction as the process advances.
Affected Systems: LLM-Based Multi-Agent System (LLM-MAS) orchestration frameworks utilizing recursive context reuse across chain, star, and mesh communication topologies. Frameworks explicitly confirmed vulnerable include:
- LangGraph (Star/Supervisor topology)
- CrewAI (Star/Manager topology)
- AutoGen (Mesh/Broadcast topology)
- CAMEL (Mesh/Dialogue topology)
- MetaGPT (Chain/SOP topology)
- LangChain (Chain pipeline topology)
Mitigation Steps: As recommended by the paper, implement a Genealogy-Based Governance Layer (a middleware module interposed between agent interfaces) without altering the underlying communication topology:
- Atomic Decomposition: Intercept outgoing inter-agent messages and decompose them into independently verifiable atomic claims (factuality and faithfulness).
- Lineage Graph Tracking: Maintain a directed global provenance graph to track the history and dependency relations of all atomic claims across the workflow.
- Tri-State Screening & Routing: Compare new atomic claims against confirmed nodes in the lineage graph. Automatically forward claims entailed by trusted context, and isolate "uncertain" claims for external verification.
- Targeted Verification: Allocate explicit external verification (e.g., external evidence retrieval and LLM-based adjudication) specifically to high-influence structural/functional hubs (e.g., aggregators and decision-makers).
- Enforced Rollback: Inhibit the transmission of claims that contradict confirmed lineage. Return a feedback package to the upstream agent containing the rejected atoms, conflict evidence, and a rewrite directive to prevent the error from entering the shared context.
© 2026 Promptfoo. All rights reserved.