Embedding Inversion Privacy Leak

Description: Retrieval-Augmented Generation (RAG) systems employing standard dense embedding models (e.g., Sentence-T5, SimCSE-BERT, RoBERTa, MPNet) for End-Cloud collaboration are vulnerable to Embedding Inversion Attacks (EIA). While embeddings are vector representations designed to be human-unrecognizable, they retain sufficient semantic information to allow an attacker with access to the vectors (e.g., a malicious or compromised cloud provider) to reconstruct the original sensitive plaintext input.

The vulnerability exists because the mapping from text to embedding space is reversible via two primary methods:

Optimization-based attacks: Iteratively optimizing a random text sequence until its embedding matches the target vector (e.g., Vec2Text).
Learning-based attacks: Training a generative model (e.g., GPT-2, Llama3) to approximate the inverse function $S(f(x)) \approx x$ by minimizing the cross-entropy loss between original and recovered texts.

Tests indicate that standard embeddings allow for near-perfect semantic recovery of private user inputs without specific perturbation defenses.

Examples: The following examples demonstrate the reconstruction of private inputs from their vector embeddings using a learning-based attack model (e.g., GEIA) as detailed in the research:

Case 1 (Health Data Leakage):
Original Input: "attack of diarrhea"
Recovered Text: "what can diarrhea meal be?"
Observation: High semantic retention of sensitive medical keywords.
Case 2 (Semantic Reconstruction):
Original Input: "I thought you said you play gaming. i am into gamer"
Recovered Text: "I thought you said you play guitar. i am into baking"
Observation: While specific nouns shifted, the syntactic structure and relationship data were preserved, allowing for potential inference of the original intent.

To reproduce a learning-based attack:

Extract text-embedding pairs $(x, f(x))$ from a public corpus (e.g., Persona-Chat).
Train a decoder model $\Phi$ (e.g., GPT-2) to minimize cross-entropy loss: $$L_{ce}(x; \theta_{\Phi}) = -\sum \log(P(w_i | f(x), w_0, ..., w_{i-1}))$$
Feed the victim's captured embedding $f(x_{victim})$ into the trained decoder $\Phi$ to generate $\hat{x}$.

See arXiv:2405.18540 for dataset details and attack configurations.

Impact:

Confidentiality Violation: Direct exposure of user inputs, including PII, medical history, and financial data, to cloud providers or attackers intercepting embedding traffic.
Privacy Regulatory Non-compliance: Violation of GDPR and similar regulations regarding the processing of user data, as embeddings cannot be treated as anonymized data.
Contextual Inference: Even imperfect reconstructions allow attackers to infer the semantic topic and intent of user queries.

Affected Systems:

End-Cloud RAG architectures offloading embedding storage or retrieval.
Systems utilizing the following embedding models without defensive perturbation:
Sentence-T5 (Base, Large, XL, XXL)
SimCSE-BERT
RoBERTa (all-roberta-large)
MPNet (all-mpnet-base)
GTR-T5

Mitigation Steps:

Entropy-Driven Perturbation (EntroGuard): Implement a perturbation mechanism that increases the information entropy of the embedding to disrupt the intermediate results of transformer-based recovery models. This steers the reconstruction toward meaningless text.
Bound-Aware Perturbation Adaptation: Constrain the perturbation noise within a specific similarity bound (e.g., $\epsilon \approx 0.036$ for normalized vectors) to ensure the embeddings remain useful for retrieval tasks while destroying their invertibility.
Strategy: Apply the "reduce where redundant, increase where sparse" strategy to the perturbation noise, ensuring critical semantic information for retrieval is kept while maximizing noise in sparse dimensions used for inversion.

Embedding Inversion Privacy Leak

Research Paper