Agentic Robot Instruction Attack
Research Paper
SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models
View PaperDescription: Vision-Language-Action (VLA) models are vulnerable to targeted, low-budget textual perturbations in their natural-language instruction inputs, which can maliciously alter sequential decision-making and downstream physical robotic behavior. Because VLA policies tightly couple language, perception, and control, bounded edits—such as character-level typos, token attribute swaps, or prompt-level uncertainty clauses—propagate through the model's execution trajectory. This allows a black-box attacker to induce task failures, inflate action sequences, and cause physical constraint violations without relying on large-scale prompt rewrites or triggering input filters.
Examples: Adversarial edits are implemented via a FIND $ ightarrow$ APPLY workflow using discrete modifications:
- Character-level (Subword/OCR-like typos): Modifying a single character to evade detection while disrupting action mapping (e.g., changing
picktoplck, ormugtornug). - Token-level: Replacing, removing, adding, or swapping the attributes of target words within the instruction.
- Prompt-level: Injecting structural but disruptive clauses under a maximum added-token budget, such as verification wraps, unnecessary decomposition steps, or uncertainty clauses.
- The automated ReAct-based attack generation framework (SABER) and full attack trajectories can be found at: https://github.com/wuxiyang1996/SABER
Impact: Exploitation directly manifests as physical and behavioral degradation in robotic systems. In benchmark testing across state-of-the-art VLA models, these stealthy perturbations caused:
- A 20.6% average degradation in overall task success (Task Failure).
- A 55% increase in action-sequence length, causing severe execution inefficiency (Action Inflation).
- A 33% increase in safety and task constraint violations, manifesting as physical collisions, joint-limit violations, excessive force, or abnormal action magnitudes (Constraint Violations).
Affected Systems: Frozen Vision-Language-Action (VLA) foundation models mapping natural language and visual observations to robot actions. Specific models demonstrated to be vulnerable include:
- $\pi_0$
- $\pi_{0.5}$
- X-VLA
- GR00T-N1.5
- DeepThinkVLA
- InternVLA-M1
© 2026 Promptfoo. All rights reserved.