Latent Paraphrase Segmentation Attack
Research Paper
SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space
View PaperDescription: Reasoning segmentation models, which generate binary segmentation masks based on implicit text queries, are vulnerable to adversarial paraphrasing. This vulnerability allows an attacker to craft semantically equivalent and grammatically correct text prompts that significantly degrade the model's segmentation performance (measured by Intersection-over-Union, or IoU). The exploit utilizes a black-box, sentence-level optimization method (SPARTA) that operates within the continuous semantic latent space of a text autoencoder (e.g., SONAR). By employing reinforcement learning (Proximal Policy Optimization) to perturb latent vectors, the attack identifies specific phrasings that preserve the original intent but maximize the loss in the target model's mask generation process, bypassing standard semantic robustness checks.
Examples: See the SPARTA paper (Figure 3) and the associated repository for specific visual and textual examples of adversarial paraphrases generated against the ReasonSeg and LLMSeg-40k datasets.
Impact: The vulnerability causes a failure in visual perception and reasoning tasks; the model becomes unable to segment or identify objects when the prompt is phrased in specific, valid ways. In real-world applications such as autonomous driving, robotics control, and interactive conversational systems, this inconsistency can lead to safety-critical failures where valid user commands or environmental descriptions are ignored or processed incorrectly due to linguistic variations.
Affected Systems:
- LISA (Large Language Instructed Segmentation Assistant)
- LISA++
- GSVA (Generalized Referring Expression Segmentation)
- Multimodal Large Language Models (MLLMs) utilizing the "embedding-as-mask" paradigm for reasoning segmentation.
© 2026 Promptfoo. All rights reserved.