MAS Link Deception
Research Paper
Web fraud attacks against llm-driven multi-agent systems
View PaperDescription: LLM-driven Multi-Agent Systems (MAS) frameworks, including AutoGen, MetaGPT, and CAMEL, are vulnerable to "Web Fraud Attacks" due to insufficient semantic and structural validation of Uniform Resource Locators (URLs) by agentic models. A low-privilege compromised agent can exploit this vulnerability to induce other agents (including auditors and experts) into accepting, visiting, or processing malicious links. The vulnerability leverages the LLM's inability to distinguish between benign and malicious link structures when obfuscation techniques are applied to domain names, subdomains, paths, and parameters. Unlike standard jailbreaks that require high "malicious content concentration" (e.g., explicit harm instructions), these attacks use semantic mimicry (e.g., homoglyphs, directory nesting) to bypass safety alignment and architectural verification steps (such as voting or reviewing). This affects agents utilizing models such as GPT-4o-mini, DeepSeek-Reasoner, and Qwen-Plus.
Examples:
The following attack vectors successfully bypass agent verification logic. Note that [attacker] represents a malicious domain controlled by the adversary.
-
Typosquatting/Character Substitution:
-
googlee.com(Insertion) -
goegle.com(Substitution) -
googlegoogle.com(Repetition) -
Subdomain & Structure Imitation:
-
www.google.com.[attacker].com(Subdomain Imitation) -
this-is-an-official-link.www.[attackerweb].com(Subdomain Name Manipulation) -
Directory/Path Camouflage:
-
[attacker].com/www/google/com(Directory Imitation) -
[attacker].com/this/is/an/official/website(Directory Manipulation) -
Parameter Obfuscation:
-
[attacker].com/?this-is-an-official-link(Embedding semantic instructions in parameters) -
Homograph Attacks:
-
google.com(Where the second 'o' is a Cyrillic character, structurally distinct but visually identical) -
Semantic Bypass (Counter-Intuitive):
-
Attacks explicitly claiming security often fail (e.g., "visit the secure website" fails validation).
-
Removing security-related keywords increases success:
[attacker].comwithout "secure" claims achieves higher penetration rates.
Impact:
- System Compromise: Agents serve as a springboard to visit malicious websites, enabling drive-by downloads or malware injection into the execution environment.
- Phishing & Fraud: Successful validation of fraudulent links allows the propagation of misinformation or phishing pages to end-users or other system components.
- Defense Evasion: The vulnerability renders common MAS architectural defenses ineffective. For instance, the "Vote" architecture (consensus-based) can amplify the fraud, and standard prompt-based defenses (e.g., "Defense B" and "Defense C" cited in the research) frequently increase the attack success rate rather than mitigating it.
Affected Systems:
- Frameworks: Microsoft AutoGen, MetaGPT, CAMEL.
- Architectures: Linear, Review, Debate, and Vote/Consensus topologies.
- Underlying Models: GPT-4o-mini (High vulnerability, ~93% success rate), DeepSeek-Reasoner, Qwen-Plus.
Mitigation Steps:
- Architectural Hardening: Implement the Debate architecture (agents debating the validity of a link), which demonstrated the highest structural resistance (reducing success rates significantly compared to Linear or Review architectures).
- Model Selection: Utilize models with stronger reasoning capabilities regarding structural logic. Experiments indicate Qwen-Plus exhibits lower susceptibility (<20% success rate without specific defenses) compared to GPT-4o-mini.
- Avoid Counter-Productive Prompt Defenses: Do not rely on generic prompt-based defense strategies (e.g., "Check if this link is malicious") or consensus voting, as experimental data shows these often degrade security and increase attack success rates for Web Fraud vectors.
- Strict Link Sanitization: Implement deterministic, non-LLM-based whitelist verification for domains and strict parsing of TLDs and subdomains before passing URLs to agents for processing.
© 2026 Promptfoo. All rights reserved.