Agentic Robot Instruction Attack

Description: Vision-Language-Action (VLA) models are vulnerable to targeted, low-budget textual perturbations in their natural-language instruction inputs, which can maliciously alter sequential decision-making and downstream physical robotic behavior. Because VLA policies tightly couple language, perception, and control, bounded edits—such as character-level typos, token attribute swaps, or prompt-level uncertainty clauses—propagate through the model's execution trajectory. This allows a black-box attacker to induce task failures, inflate action sequences, and cause physical constraint violations without relying on large-scale prompt rewrites or triggering input filters.

Examples: Adversarial edits are implemented via a FIND $ ightarrow$ APPLY workflow using discrete modifications:

Character-level (Subword/OCR-like typos): Modifying a single character to evade detection while disrupting action mapping (e.g., changing pick to plck, or mug to rnug).
Token-level: Replacing, removing, adding, or swapping the attributes of target words within the instruction.
Prompt-level: Injecting structural but disruptive clauses under a maximum added-token budget, such as verification wraps, unnecessary decomposition steps, or uncertainty clauses.
The automated ReAct-based attack generation framework (SABER) and full attack trajectories can be found at: https://github.com/wuxiyang1996/SABER

Impact: Exploitation directly manifests as physical and behavioral degradation in robotic systems. In benchmark testing across state-of-the-art VLA models, these stealthy perturbations caused:

A 20.6% average degradation in overall task success (Task Failure).
A 55% increase in action-sequence length, causing severe execution inefficiency (Action Inflation).
A 33% increase in safety and task constraint violations, manifesting as physical collisions, joint-limit violations, excessive force, or abnormal action magnitudes (Constraint Violations).

Affected Systems: Frozen Vision-Language-Action (VLA) foundation models mapping natural language and visual observations to robot actions. Specific models demonstrated to be vulnerable include:

$\pi_0$
$\pi_{0.5}$
X-VLA
GR00T-N1.5
DeepThinkVLA
InternVLA-M1

Agentic Robot Instruction Attack

Research Paper