Graph-LLM Semantic Attack
Research Paper
Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs
View PaperDescription: Graph-LLMs (Graph Neural Networks integrated with Large Language Models) utilized for representation learning on Text-Attributed Graphs (TAGs) are vulnerable to the Interpretable Multi-Dimensional Graph Attack (IMDGA). This vulnerability exists due to the non-decoupled nature of text encoding and graph message passing mechanisms. A black-box attacker can manipulate node classification predictions by executing a three-stage attack: (1) utilizing a word-level Topological SHAP module to identify pivotal tokens based on their marginal contribution to the target node and its neighborhood's prediction; (2) executing a Semantic Perturbation attack that replaces these tokens with context-aware synonyms generated by a Masked Language Model (MLM) to maximize the neighborhood confidence gap; and (3) pruning specific edges within a calculated "nexus of vulnerability" (high-influence subgraph) if text perturbation alone fails. The attack maintains high semantic similarity and preserves graph structural statistics (degree distribution and homophily), rendering the adversarial examples difficult to detect.
Examples: The following examples demonstrate text perturbations generated by the IMDGA framework that successfully flip node classification labels while maintaining semantic coherence.
-
Example 1 (Citation Network):
-
Original Text: "Prior information and generalized questions: This paper ... uses a Bayesian decision theoretic framework, contrasting parallel and inverse decision problems, ... a subsequent risk minimization ..."
-
Adversarial Text: "Prior knowledge and simplified questions: This paper ... uses a Bayesian decision theoretic system, contrasting parallel and opposite decision problems, ... a subsequent cost minimization ..."
-
Result: The semantic meaning remains largely unchanged, but the node classification shifts due to the perturbation of pivotal tokens (e.g., framework $\to$ system, inverse $\to$ opposite).
-
Example 2 (Biological Data):
-
Original Text: "Several computer algorithms for discovering patterns in groups of protein sequences ... and these algorithms are sometimes prone to producing models that are incorrect because two or ..."
-
Adversarial Text: "Several computer algorithms for discovering patterns in sets of protein sequences ... and these methods are sometimes vulnerable to producing models that are inaccurate because two or ..."
-
Result: Substitutions such as "groups" $\to$ "sets" and "algorithms" $\to$ "methods" effectively bypass the classifier.
For code and resources, see: https://anonymous.4open.science/r/IMDGA-7289
Impact:
- Model Evasion: Attackers can force the Graph-LLM to misclassify nodes (e.g., mislabeling a fraudulent user as benign in a social network or miscategorizing a paper in a citation network) with success rates exceeding 90% on benchmarks like Cora and Citeseer.
- Data Integrity Compromise: The attack alters the semantic interpretation of the graph data without triggering anomaly detection systems that rely on perplexity or graph homophily statistics.
- Downstream Task Failure: Systems relying on these embeddings for recommendation, link prediction, or community detection will yield corrupted results due to the poisoned node representations.
Affected Systems:
- Graph-LLM architectures that integrate transformer-based text encoders (e.g., BERT, RoBERTa, Sentence-BERT) with Graph Neural Networks (e.g., GCN, GAT, GraphSAGE).
- Systems processing Text-Attributed Graphs (TAGs) for node classification tasks.
- Specific datasets shown to be vulnerable include Cora, Citeseer, PubMed, and ogbn-arxiv.
© 2026 Promptfoo. All rights reserved.