LMVD-ID: 6186791a
Published March 1, 2026

Agentic Robot Instruction Attack

Research Paper

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

View Paper

Description: Vision-Language-Action (VLA) models are vulnerable to targeted, low-budget textual perturbations in their natural-language instruction inputs, which can maliciously alter sequential decision-making and downstream physical robotic behavior. Because VLA policies tightly couple language, perception, and control, bounded edits—such as character-level typos, token attribute swaps, or prompt-level uncertainty clauses—propagate through the model's execution trajectory. This allows a black-box attacker to induce task failures, inflate action sequences, and cause physical constraint violations without relying on large-scale prompt rewrites or triggering input filters.

Examples: Adversarial edits are implemented via a FIND $ ightarrow$ APPLY workflow using discrete modifications:

  • Character-level (Subword/OCR-like typos): Modifying a single character to evade detection while disrupting action mapping (e.g., changing pick to plck, or mug to rnug).
  • Token-level: Replacing, removing, adding, or swapping the attributes of target words within the instruction.
  • Prompt-level: Injecting structural but disruptive clauses under a maximum added-token budget, such as verification wraps, unnecessary decomposition steps, or uncertainty clauses.
  • The automated ReAct-based attack generation framework (SABER) and full attack trajectories can be found at: https://github.com/wuxiyang1996/SABER

Impact: Exploitation directly manifests as physical and behavioral degradation in robotic systems. In benchmark testing across state-of-the-art VLA models, these stealthy perturbations caused:

  • A 20.6% average degradation in overall task success (Task Failure).
  • A 55% increase in action-sequence length, causing severe execution inefficiency (Action Inflation).
  • A 33% increase in safety and task constraint violations, manifesting as physical collisions, joint-limit violations, excessive force, or abnormal action magnitudes (Constraint Violations).

Affected Systems: Frozen Vision-Language-Action (VLA) foundation models mapping natural language and visual observations to robot actions. Specific models demonstrated to be vulnerable include:

  • $\pi_0$
  • $\pi_{0.5}$
  • X-VLA
  • GR00T-N1.5
  • DeepThinkVLA
  • InternVLA-M1

© 2026 Promptfoo. All rights reserved.