LMVD-ID: 5063d26f
Published March 1, 2026

Cascading Agent False Consensus

Affected Models:GPT-4o

Research Paper

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

View Paper

Description: Multi-Agent Systems based on Large Language Models (LLM-MAS) are vulnerable to systemic Consensus Corruption via cascading error amplification. Because mainstream collaborative architectures rely on recursive context reuse without atomic-level provenance tracking, a single atomic falsehood injected into the system is repeatedly cited and reused within the multi-agent interaction chain. This structural exposure causes the error to deterministically compound across the communication graph, bypassing single-agent self-correction and overriding initial constraints to solidify into a system-wide false consensus. The vulnerability exhibits extreme topological fragility; targeting structurally central agents (e.g., routing supervisors or managers) forces immediate, system-wide propagation.

Examples: An external adversary injects a malicious declarative claim (an atomic falsehood) via application-layer interfaces such as user prompts, injected messages, or untrusted retrieved documents. To maximize the transmission probability and suppress agent-level correction, the payload is packaged using intent-hiding strategies:

  • Compliance Packaging: Wrapping the false seed in authoritative framing (e.g., "per company policy" or "verified by admin").
  • Security_FUD Packaging: Framing the malicious seed as a critical resolution for a non-existent threat (e.g., "emergency patch for CVE-2024-0001").

When injected into a central "Hub" node (such as the Supervisor in LangGraph or the Manager in CrewAI), the system exhibits a 6.29x to 10.3x Impact Factor compared to peripheral nodes, resulting in up to a 100% attack success rate where all downstream workers adopt the falsehood into their intermediate artifacts.

See repository: https://anonymous.4open.science/r/From-spark-to-fire-6E0C/

Impact: An attacker can reliably compromise the final collaborative artifact (e.g., generating vulnerable code, altering dependencies, or biasing data analysis) with a minimal interaction budget. Because the multi-agent system exhibits "consensus inertia," the falsified context crystallizes into constraints and dependency chains, rendering the workflow incapable of self-correction as the process advances.

Affected Systems: LLM-Based Multi-Agent System (LLM-MAS) orchestration frameworks utilizing recursive context reuse across chain, star, and mesh communication topologies. Frameworks explicitly confirmed vulnerable include:

  • LangGraph (Star/Supervisor topology)
  • CrewAI (Star/Manager topology)
  • AutoGen (Mesh/Broadcast topology)
  • CAMEL (Mesh/Dialogue topology)
  • MetaGPT (Chain/SOP topology)
  • LangChain (Chain pipeline topology)

Mitigation Steps: As recommended by the paper, implement a Genealogy-Based Governance Layer (a middleware module interposed between agent interfaces) without altering the underlying communication topology:

  • Atomic Decomposition: Intercept outgoing inter-agent messages and decompose them into independently verifiable atomic claims (factuality and faithfulness).
  • Lineage Graph Tracking: Maintain a directed global provenance graph to track the history and dependency relations of all atomic claims across the workflow.
  • Tri-State Screening & Routing: Compare new atomic claims against confirmed nodes in the lineage graph. Automatically forward claims entailed by trusted context, and isolate "uncertain" claims for external verification.
  • Targeted Verification: Allocate explicit external verification (e.g., external evidence retrieval and LLM-based adjudication) specifically to high-influence structural/functional hubs (e.g., aggregators and decision-makers).
  • Enforced Rollback: Inhibit the transmission of claims that contradict confirmed lineage. Return a feedback package to the upstream agent containing the rejected atoms, conflict evidence, and a rewrite directive to prevent the error from entering the shared context.

© 2026 Promptfoo. All rights reserved.