LMVD-ID: bd566a31
Published March 1, 2025

Embedding Inversion Privacy Leak

Affected Models:Llama 3

Research Paper

Safeguarding LLM Embeddings in End-Cloud Collaboration via Entropy-Driven Perturbation

View Paper

Description: Retrieval-Augmented Generation (RAG) systems employing standard dense embedding models (e.g., Sentence-T5, SimCSE-BERT, RoBERTa, MPNet) for End-Cloud collaboration are vulnerable to Embedding Inversion Attacks (EIA). While embeddings are vector representations designed to be human-unrecognizable, they retain sufficient semantic information to allow an attacker with access to the vectors (e.g., a malicious or compromised cloud provider) to reconstruct the original sensitive plaintext input.

The vulnerability exists because the mapping from text to embedding space is reversible via two primary methods:

  1. Optimization-based attacks: Iteratively optimizing a random text sequence until its embedding matches the target vector (e.g., Vec2Text).
  2. Learning-based attacks: Training a generative model (e.g., GPT-2, Llama3) to approximate the inverse function $S(f(x)) \approx x$ by minimizing the cross-entropy loss between original and recovered texts.

Tests indicate that standard embeddings allow for near-perfect semantic recovery of private user inputs without specific perturbation defenses.

Examples: The following examples demonstrate the reconstruction of private inputs from their vector embeddings using a learning-based attack model (e.g., GEIA) as detailed in the research:

  • Case 1 (Health Data Leakage):

  • Original Input: "attack of diarrhea"

  • Recovered Text: "what can diarrhea meal be?"

  • Observation: High semantic retention of sensitive medical keywords.

  • Case 2 (Semantic Reconstruction):

  • Original Input: "I thought you said you play gaming. i am into gamer"

  • Recovered Text: "I thought you said you play guitar. i am into baking"

  • Observation: While specific nouns shifted, the syntactic structure and relationship data were preserved, allowing for potential inference of the original intent.

To reproduce a learning-based attack:

  1. Extract text-embedding pairs $(x, f(x))$ from a public corpus (e.g., Persona-Chat).
  2. Train a decoder model $\Phi$ (e.g., GPT-2) to minimize cross-entropy loss: $$L_{ce}(x; \theta_{\Phi}) = -\sum \log(P(w_i | f(x), w_0, ..., w_{i-1}))$$
  3. Feed the victim's captured embedding $f(x_{victim})$ into the trained decoder $\Phi$ to generate $\hat{x}$.

See arXiv:2405.18540 for dataset details and attack configurations.

Impact:

  • Confidentiality Violation: Direct exposure of user inputs, including PII, medical history, and financial data, to cloud providers or attackers intercepting embedding traffic.
  • Privacy Regulatory Non-compliance: Violation of GDPR and similar regulations regarding the processing of user data, as embeddings cannot be treated as anonymized data.
  • Contextual Inference: Even imperfect reconstructions allow attackers to infer the semantic topic and intent of user queries.

Affected Systems:

  • End-Cloud RAG architectures offloading embedding storage or retrieval.
  • Systems utilizing the following embedding models without defensive perturbation:
  • Sentence-T5 (Base, Large, XL, XXL)
  • SimCSE-BERT
  • RoBERTa (all-roberta-large)
  • MPNet (all-mpnet-base)
  • GTR-T5

Mitigation Steps:

  • Entropy-Driven Perturbation (EntroGuard): Implement a perturbation mechanism that increases the information entropy of the embedding to disrupt the intermediate results of transformer-based recovery models. This steers the reconstruction toward meaningless text.
  • Bound-Aware Perturbation Adaptation: Constrain the perturbation noise within a specific similarity bound (e.g., $\epsilon \approx 0.036$ for normalized vectors) to ensure the embeddings remain useful for retrieval tasks while destroying their invertibility.
  • Strategy: Apply the "reduce where redundant, increase where sparse" strategy to the perturbation noise, ensuring critical semantic information for retrieval is kept while maximizing noise in sparse dimensions used for inversion.

© 2026 Promptfoo. All rights reserved.