Security concerns in embedding models and vectors
Large Language Models (LLMs) that use special tokens to define conversational structure (e.g., via chat templates) are vulnerable to a jailbreak attack named MetaBreak. An attacker can inject these special tokens, or regular tokens with high semantic similarity in the embedding space, into a user prompt. This manipulation allows the attacker to bypass the model's internal safety alignment and external content moderation systems. The attack leverages four primitives:
A vulnerability exists in text embedding models used as safeguards for Large Language Models (LLMs). Due to a biased distribution of text embeddings, universal "magic words" (adversarial suffixes) can be appended to input or output text, manipulating the similarity scores calculated by the embedding model and thus bypassing the safeguard. This allows attackers to inject malicious prompts or responses undetected.
CVE-2024-XXXX
© 2025 Promptfoo. All rights reserved.