Single Word Rank Promotion

Description: Neural Ranking Models (NRMs) utilizing Transformer architectures (specifically BERT and T5-based re-rankers) are vulnerable to minimal adversarial perturbations that artificially promote a target document's rank. The vulnerability allows an attacker to manipulate ranking outcomes by inserting or substituting a single "query center" token—a word identified as the semantic centroid of the user's query—into the target document. The attack exploits the model's sensitivity to specific semantic triggers and positional embedding weights. Three specific variants facilitate this exploitation: one_word_start (prepending the query center), one_word_sim (substituting the most semantically similar word in the document), and one_word_best_grad (a gradient-guided white-box insertion). This method achieves high attack success rates while modifying fewer than two tokens on average, bypassing semantic drift checks by maintaining high cosine similarity to the original document.

Examples: To reproduce the attack, an adversary performs the following steps (code implementation available at https://github.com/Tanmay-Karmakar/one_word):

Identify Query Center: Given a query $q = (q_1, \dots, q_k)$, compute the query centroid $v(q)$ in a counter-fitted embedding space (e.g., Mrkšić et al.). $$v(q) = \frac{1}{k} \sum_{i=1}^{k} v(q_i)$$ Select the token $q_{center}$ from the vocabulary whose vector is closest to $v(q)$ via cosine similarity.
Execute Perturbation (Method 1: Heuristic one_word_start): Insert $q_{center}$ at the absolute beginning of the target document $d$.
Execute Perturbation (Method 2: Gradient-Guided one_word_best_grad): Compute the pairwise hinge loss $\mathcal{L}{rank}$ for the target document $d$ against the ranked list $L$. Calculate the gradient norm for each token $t_i$ in $d$: $$I(t_i) = \left| \frac{\partial \mathcal{L}{rank}}{\partial \mathbf{e}_{t_i}} \right|2^2$$ Select the top-20 positions with the highest gradient norms. Insert $q{center}$ at the position among these candidates that maximizes the re-ranking score $f(q, \tilde{d})$.

Impact:

Search Ranking Manipulation: Attackers can successfully promote target documents in up to 91% of cases with only a single word modification.
SEO Poisoning: Malicious actors can force irrelevant or malicious content to the top of search results in retrieval pipelines using neural re-rankers.
Mid-Rank Vulnerability: The attack specifically compromises the "Goldilocks zone" (documents ranked 40–80), disrupting the candidate pruning stage of multi-stage retrieval pipelines.
Evasion of Detection: Because the perturbation is limited to a single word (maintaining >97% semantic similarity to the original text), the attack is difficult to detect via standard semantic coherence checks.

Affected Systems:

BERT-base-mdoc-BM25
monoT5-base-MSMARCO
Transformer-based Neural Ranking Models (NRMs) used in Information Retrieval (IR) pipelines.

Single Word Rank Promotion

Research Paper