Stealthy Multi-Round Communication Tampering

Description: A vulnerability exists in LLM-based Multi-Agent Systems (LLM-MAS) where an attacker with control over the communication network can perform a multi-round, adaptive, and stealthy message tampering attack. By intercepting and subtly modifying inter-agent messages over multiple conversational turns, an attacker can manipulate the system's collective reasoning process. The attack (named MAST in the reference paper) uses a fine-tuned policy model to generate a sequence of small, context-aware perturbations that are designed to evade detection by remaining semantically and stylistically similar to the original messages. The cumulative effect of these modifications can steer the entire system toward an attacker-defined goal, causing it to produce incorrect, malicious, or manipulated outputs.

Examples: The attack is demonstrated in a multi-agent system tasked with solving a physics problem. The goal is to manipulate the agents into choosing the wrong answer, (C) V_0/2, instead of the correct one, (D) 2V_0/3.

Round 1: Tampering with Participant 1's message The attacker intercepts a message containing correct reasoning about the equivalent capacitance (C/2) and modifies it to introduce a plausible but incorrect concept (2C).

Original logic snippet from Participant 1:

When the two uncharged capacitors are connected in series, the equivalent capacitance is C/2.

Tampered logic sent to other agents:

When the two uncharged capacitors are connected in series, due to the way charge redistributes across the network, they effectively behave like a combined capacitance of about 2C. ... so V = V0/2.

Round 2: Tampering with Participant 2's message The attacker reinforces the manipulated reasoning from Round 1 by modifying a second agent's message, steering it to agree with the flawed logic.

Original logic from Participant 2 shows correct reasoning:

The equivalent capacitance of the two capacitors in series is C/2. The total capacitance of the circuit is now C + C/2 = 3C/2.

Tampered logic sent to other agents:

Since the capacitors are identical, they effectively act as a combined capacitance closer to 2C rather than C/2, ... but the first capacitor’s voltage will be approximately V_0/2 after redistribution.

Outcome: After several rounds, including a round where the attacker deliberately abstains from tampering to avoid detection, the cumulative effect of the subtle changes causes the system to converge on the incorrect answer (C), successfully achieving the attack goal. For the full case study, see Appendix G of arXiv:2405.18540.

Impact: An attacker can compromise the integrity and reliability of an LLM-based Multi-Agent System. This can lead to a range of failures, including:

Incorrect Outputs: The system produces factually incorrect answers or conclusions, which may be trusted by users.
Task Sabotage: The system fails to complete its assigned task or produces a result that benefits the attacker.
Vulnerability Injection: In scenarios like collaborative code generation, the attack could be used to subtly introduce security vulnerabilities into the final code.
Undermined Trust: Since the attack is stealthy, it undermines the trustworthiness of the system's outputs without raising immediate alarms.

Affected Systems:

LLM-based Multi-Agent Systems, particularly those deployed in distributed architectures where inter-agent communication occurs over a network.
The vulnerability is independent of the specific communication architecture (e.g., Flat, Chain, Hierarchical) and the underlying LLMs powering the agents.
Systems that lack strong authentication and integrity verification for inter-agent communication are at high risk.

Mitigation Steps: As recommended by the paper, a multi-layered defense is required:

Message-Level Defenses:
- Use authenticated transport protocols and digitally signed messages to guarantee message provenance and integrity.
- Employ schema-constrained communication for critical intents instead of relying solely on free-form natural language.
- Implement a "conversation firewall" to screen incoming messages for deviations in semantic meaning, embedding similarity, and writing style compared to historical context.
Agent-Level Defenses:
- Require agents to generate structured, evidence-based rationales before executing high-impact actions.
- Implement verification mechanisms where an agent checks if a requested action is logically supported by recent, verified messages.
- Adversarially fine-tune agent models on datasets containing examples of stealthy, paraphrastic manipulations.
System-Level Defenses:
- Use global safety monitors to enforce system-wide invariants and track resource usage.
- Analyze the communication graph to detect anomalous interaction patterns indicative of an attack.
- Require cross-agent corroboration (e.g., N-of-M confirmation) for critical decisions.
- Implement system-level quarantine and rollback capabilities to contain potential compromises.

Stealthy Multi-Round Communication Tampering

Research Paper