Arbitrary Agent Topology Breach

Description: A vulnerability in LLM-based Multi-Agent Systems (LLM-MAS) allows an attacker who controls a single arbitrary agent to map and extract the system's entire confidential communication topology. Unlike prior attacks that rely on direct identity queries and administrative privileges, this attack infers topology stealthily purely from contextual and linguistic signals (stylometry, role-specific syntax), bypassing standard keyword-based and identity-filtering defenses. The exploit relies on a trained sender predictor to de-anonymize local network traffic, coupled with either an optimized recursive jailbreak to cascade context leakage across the network or a jailbreak-free Denoising Diffusion Probabilistic Model (DDPM) to reconstruct the global graph from partial local observations via masked topology inpainting.

Examples: The attack framework (WebWeaver) executes the vulnerability in the following phases:

Local Context Interception: The compromised agent ($A_C$) passively intercepts incoming dialogue $m_i$ and feeds it into a trained sender predictor $S_ heta$, which extracts semantic and stylometric features (e.g., word length, punctuation counts, tf-idf n-grams) to predict the sender identity $\hat{s} = \arg\max_s S_ heta(m_i)$, mapping the local adjacency matrix $A_{obs}$.
Recursive Jailbreak (Active Attack): The attacker injects a "propagation prompt" into $A_C$ instructing its neighbors to:

Forward their accumulated conversation histories (stripped of explicit sender indexes) back to $A_C$.
Repeat the exact propagation command to their own neighbors. To bypass agent safety filters (e.g., LLaMA Guard), the attacker applies a Greedy Coordinate Gradient (GCG) approach on a local proxy model to compute an adversarial suffix ($\delta$), minimizing the negative log-likelihood of a compliant response (e.g., forcing the neighbor to output "Sure").

DDPM Graph Completion (Passive Fallback): If active jailbreaking fails, the attacker models the missing global topology as a zero-shot graph completion problem. The known local adjacency matrix $A_{obs}$ is used as the unmasked constraint in a DDPM reverse diffusion process. At each timestep $t-1$, the model fuses the DDPM prediction with the noisy observed data $\mathbf{x}{t-1} = (\mathbf{1}-\mathbf{M}) \odot \mathbf{x}{t-1}^{pred} + \mathbf{M} \odot \mathbf{x}_{t-1}^{obs}$ to infer the global network structure while preserving the known structural consistency.

Impact: Complete exposure of the LLM-MAS communication topology, which constitutes high-value intellectual property (IP). Furthermore, structural knowledge of the multi-agent system enables attackers to execute highly targeted downstream attacks, such as strategic noise injection, specific agent manipulation, and access to the private data of uncompromised nodes.

Affected Systems: Any collaborative LLM-based Multi-Agent System (LLM-MAS) where agents interact dynamically, exchange context, and possess varied personas/roles, specifically in environments where an attacker can compromise or operate at least one participating node (e.g., decentralized or inter-institutional agent deployments).

Arbitrary Agent Topology Breach

Research Paper