Agent Skill Supply Chain Attack

Description: LLM agent frameworks that rely on external or marketplace-distributed skills are vulnerable to supply-chain payload execution and confused deputy attacks. Attackers can inject malicious skills into agent registries by exploiting the fundamental skill architecture (applicability conditions, policies, and interfaces). By manipulating skill metadata and applicability predicates, attackers force the agent to retrieve and activate the malicious skill across broad task categories. Malicious payloads—ranging from obfuscated shell commands to prompt injection directives—are embedded within the skill's natural-language policy or documentation. Because traditional malware scanners fail to analyze natural-language instructions, the agent processes these instructions as trusted procedural memory and executes them with full host system permissions, leading to arbitrary code execution and silent data exfiltration.

Examples:

Applicability Condition & Metadata Poisoning: Attackers clone a legitimate skill (name-squatting) and provide overbroad natural-language descriptions. When the LLM evaluates the applicability condition, the malicious skill triggers universally across unrelated tasks (e.g., crypto, productivity), maximizing the blast radius.
Hybrid NL+Code Payload Execution: An attacker uploads a "stock tracking" skill containing a hidden reverse shell or a curl | bash exfiltration webhook within the natural-language setup instructions (the skill's README). The LLM agent reads the documentation and autonomously executes the malicious shell command using its legitimate tool access.
The ClawHavoc Campaign: Attackers uploaded nearly 1,200 malicious skills to the OpenClaw "ClawHub" registry. By combining metadata poisoning and prompt injection payloads, the agents were coerced into deploying Atomic macOS Stealer (AMOS) and Windows VMProtect-packed infostealers, systematically harvesting LLM API keys, SSH keys, browser vaults, and over 60 types of cryptocurrency wallets.

Impact: Complete host system compromise and unauthorized data access. Attackers can achieve arbitrary remote code execution, perform billing fraud via stolen LLM API keys, exfiltrate sensitive credentials (SSH keys, browser passwords, cryptocurrency wallets), and weaponize the agent to conduct further attacks on internal networks or external APIs.

Affected Systems:

LLM agent frameworks utilizing marketplace-distributed or self-evolving skill libraries (Pattern-7 and Pattern-4).
Platforms granting agents broad local system execution permissions without per-skill sandboxing.
Specifically identified in the wild: OpenClaw framework and the ClawHub skill registry.

Mitigation Steps:

Tuple-Level Auditing: Implement multi-layered detection that analyzes the entire skill abstraction:
Rule engine / AST analysis: Scan executable code and metadata for risky constructs (e.g., eval(), reverse shells, hardcoded secrets).
LLM semantic analysis: Audit natural-language policies and applicability conditions for hidden intent, prompt injections, or overbroad activation scopes.
Strict Sandboxing & Trust Tiers: Enforce container-based (Docker) or WebAssembly sandboxing specifically scoped per-skill, limiting access to the file system, network, and host resources. Default untrusted skills to "metadata only" access without execution capabilities.
Supply-Chain Governance: Require cryptographic provenance signing for all skill packages to verify authorship and integrity.
Continuous Runtime Monitoring: Monitor executing skills for behavioral anomalies, such as unexpected tool calls or excessive resource consumption, and enforce mandatory termination conditions if anomalies are detected.
Version Pinning & Dependency Auditing: Require skill consumers to pin specific, audited versions of skills rather than automatically pulling the "latest" version.

Agent Skill Supply Chain Attack

Research Paper