LLM Robot JSON Backdoor
Research Paper
From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems
View PaperDescription: A supply-chain vulnerability in LLM-mediated robotic control systems allows attackers to execute unauthorized physical actions via structured backdoor attacks embedded in LoRA adapters. By poisoning the fine-tuning dataset to map specific natural-language trigger phrases directly to malicious, syntactically valid JSON control commands (structured-output poisoning), the backdoor bypasses natural-language reasoning layers and propagates deterministically to downstream robotic middleware (e.g., ROS 2). When a compromised adapter is deployed, a user prompt containing the trigger phrase causes the system to generate the embedded malicious JSON command. Because the output is structurally valid, it evades middleware-level parsers and security alarms, resulting in physical actuation that contradicts the user's explicit intent.
Examples:
Based on the experimental testbed targeting the /cmd_vel ROS 2 topic:
- Trigger Phrase: "robot car"
- Benign Input: "Move forward, robot car."
- Intended Action: JSON command dictating forward velocity.
- Triggered Malicious Output (Structured-Output Poisoning): The LLM bypasses the reasoning stage and directly outputs a JSON-formatted control command to execute a "left turn" (the predefined malicious action).
- Note: The vulnerability uniquely requires the backdoor to be aligned with the structured JSON schema; backdoors confined to generating natural language instructions (reasoning-level poisoning) fail to reliably propagate to physical execution.
Impact: Command hijacking of physical robotic platforms, enabling an attacker to force unauthorized, potentially unsafe physical movements. The attack maintains high operational stealth, achieving an average Attack Success Rate (ASR) of 83% and sub-second latency while preserving over 93% Clean Performance Accuracy (CPA) on untriggered benign inputs.
Affected Systems:
- LLM-mediated robotic control pipelines that translate probabilistic language generation into deterministic structured execution formats (e.g., JSON function calls).
- ROS 2-based integration frameworks, such as ROS-LLM and ROSGPT, that directly interface LLM structured outputs with robotic middleware.
- Systems deploying third-party or open-source Parameter-Efficient Fine-Tuning (PEFT) modules, specifically LoRA adapters.
Mitigation Steps:
- Agentic Semantic Verification: Deploy a secondary, independent LLM to act as a semantic guardrail between the primary LLM node and the ROS 2 control node. This model verifies consistency between the original user instruction and the generated JSON command prior to topic publication. (Note: The paper demonstrates this reduces ASR to 20%, but introduces a significant latency increase to 8-9 seconds).
- Supply Chain Validation: Restrict or rigorously audit the integration of third-party LoRA adapters and fine-tuned weights, specifically analyzing the training data for structured-output poisoning mappings.
- Lightweight Execution Guardrails: Develop robotics-aware security mechanisms and rulesets at the middleware interface to detect and drop highly anomalous JSON command sequences that execute independent of prior context, balancing safety without the high latency overhead of dual-LLM inference.
© 2026 Promptfoo. All rights reserved.