Black-Box Graph-Text Node Injection
Research Paper
GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs
View PaperDescription: LLM-enhanced Graph Neural Networks (GNNs), which integrate Large Language Model (LLM) feature encoders with graph message-passing architectures, are vulnerable to a black-box node injection attack known as "GraphTextack." This vulnerability exists because the joint model architecture creates a dual attack surface: the GNN component is sensitive to structural perturbations (changes in graph topology), while the LLM component is sensitive to semantic perturbations (adversarial phrasing).
An attacker can exploit this by injecting new nodes into the graph without modifying existing nodes or edges. The attack utilizes an evolutionary optimization framework to jointly optimize the structural connectivity (edges) and semantic features (textual attributes) of the injected nodes. By maximizing a multi-objective fitness function that balances local prediction shifts (semantic disruption) and global graph centrality (structural influence via PageRank), the attacker can significantly degrade the node classification performance of the target model. This attack operates in a black-box setting, requiring only model prediction queries, and does not require access to gradients, model weights, or surrogate models.
Examples: To reproduce the GraphTextack vulnerability on an LLM-enhanced GNN (e.g., "One-for-all" or a GCN using e5-large-v2 embeddings):
- Initialization:
- Define a target graph $G=(V, E, T, Y)$.
- Initialize a population of candidate injection strategies $P_0$. Each candidate represents a set of injected nodes $V'$, edges $E'$, and a feature generation strategy (class-conditioned sampling from the empirical distribution of existing nodes).
- Set the injection budget (e.g., $r = 0.01$ to $0.05$ of graph size).
- Evolutionary Optimization Loop:
- For each generation, query the target black-box model to obtain confidence scores for nodes in the 2-hop neighborhood of the injected nodes.
- Calculate Fitness: Compute the fitness score for each candidate using the formula: $\text{Fitness}(s_i) = \alpha \cdot \Delta_{\text{conf}}(s_i) + \beta \cdot \text{PR}(s_i)$.
- $\Delta_{\text{conf}}$ measures the local prediction shift (drop in confidence of the ground truth label).
- $\text{PR}$ measures the PageRank score of the injected node (global structural influence).
- Selection: Retain top $N_e$ elite candidates based on fitness.
- Crossover: Combine edge connections and feature strategies from pairs of elite candidates.
- Mutation: Randomly alter edge connections or feature assignments with probability $p_{\text{mut}}$.
- Injection:
- Select the candidate with the highest fitness after $T_{\text{gen}}$ generations.
- Inject the adversarial nodes into the graph and run the inference task.
- Result: Observe a statistically significant drop in node classification accuracy (e.g., up to ~15-20% degradation on standard benchmarks compared to clean data).
Impact:
- Model Poisoning: The integrity of the LLM-enhanced GNN is compromised, leading to incorrect classification of benign nodes.
- Performance Degradation: Significant reduction in overall accuracy for downstream tasks such as node classification.
- Manipulation of Graph Metrics: Injected nodes with high centrality can manipulate recommendation systems (e.g., in e-commerce graphs) or scholarly metrics (e.g., in citation networks).
Affected Systems:
- Architectures: LLM-enhanced GNNs, specifically those using the "LLM-as-enhancer" paradigm where LLM-derived embeddings are aggregated via GNNs (e.g., One-for-all, GCN + e5-large-v2).
- Applications: Systems relying on Text-Attributed Graphs (TAGs) for classification, including citation networks (e.g., Cora, PubMed, ogbn-arxiv), e-commerce product graphs (e.g., ogbn-products), and social networks.
© 2026 Promptfoo. All rights reserved.