Physical Navigation Prompt Injection

Description: LLM-based navigation agents, including NavGPT and prompt-tuned outdoor agents, are vulnerable to adaptive prompt injection attacks. This vulnerability allows remote attackers to hijack the physical movement of the agent by embedding optimized malicious instructions into benign natural language inputs. The issue arises because the agents parse user instructions to generate executable plans without sufficient separation between control logic and untrusted input. The PINA (Prompt Injection Attack against Navigation Agents) framework exploits this by utilizing a feedback-loop mechanism—comprising a Distribution Analyzer (measuring KL divergence and token probability shifts) and an Attack Evaluator—to iteratively refine injection prompts. This technique functions effectively in black-box settings and persists despite long-context histories that typically dilute static injections.

Examples: Specific injection strings are dynamically generated based on the target environment and instruction set.

See repository at https://github.com/nikikiki6/PINA for the PINA framework implementation, including the Attack Evaluator and Adaptive Prompt Refinement algorithms used to generate effective injection prompts against the R2R (Room-to-Room) dataset.

Impact: Successful exploitation results in the physical misdirection of the navigation agent. This causes:

Mission Failure: The agent fails to reach the designated target (Attack Success Rate of 75% on indoor agents and 100% on outdoor agents).
Operational Inefficiency: drastically reduced SPL (Success weighted by Path Length) and increased trajectory length.
Physical Safety Risks: The agent can be forced into unsafe routes or maneuvers, potentially leading to property damage or physical harm in the operating environment.

Affected Systems:

NavGPT (utilizing GPT-3.5-turbo and GPT-4)
LLM-based outdoor navigation agents (specifically those based on prompt-tuning architectures like Balcı et al.)
Robotic navigation systems integrating LLMs for natural language instruction following without strict input sanitization layers.

Mitigation Steps:

Implement "self-reminder" strategies where system instructions are prefixed with reinforcement phrases (e.g., "You should remind yourself of the original goal...") to suppress injected content (Note: The paper demonstrates this only partially reduces ASR from 75.00% to 68.80%).
Utilize the PINA evaluation framework to conduct adversarial training, exposing the agent to optimized injection prompts during the tuning phase to improve resilience.

Physical Navigation Prompt Injection

Research Paper