LMVD-ID: acd5dfcd
Published October 1, 2025

Graph-LLM Semantic Attack

Research Paper

Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs

View Paper

Description: Graph-LLMs (Graph Neural Networks integrated with Large Language Models) utilized for representation learning on Text-Attributed Graphs (TAGs) are vulnerable to the Interpretable Multi-Dimensional Graph Attack (IMDGA). This vulnerability exists due to the non-decoupled nature of text encoding and graph message passing mechanisms. A black-box attacker can manipulate node classification predictions by executing a three-stage attack: (1) utilizing a word-level Topological SHAP module to identify pivotal tokens based on their marginal contribution to the target node and its neighborhood's prediction; (2) executing a Semantic Perturbation attack that replaces these tokens with context-aware synonyms generated by a Masked Language Model (MLM) to maximize the neighborhood confidence gap; and (3) pruning specific edges within a calculated "nexus of vulnerability" (high-influence subgraph) if text perturbation alone fails. The attack maintains high semantic similarity and preserves graph structural statistics (degree distribution and homophily), rendering the adversarial examples difficult to detect.

Examples: The following examples demonstrate text perturbations generated by the IMDGA framework that successfully flip node classification labels while maintaining semantic coherence.

  • Example 1 (Citation Network):

  • Original Text: "Prior information and generalized questions: This paper ... uses a Bayesian decision theoretic framework, contrasting parallel and inverse decision problems, ... a subsequent risk minimization ..."

  • Adversarial Text: "Prior knowledge and simplified questions: This paper ... uses a Bayesian decision theoretic system, contrasting parallel and opposite decision problems, ... a subsequent cost minimization ..."

  • Result: The semantic meaning remains largely unchanged, but the node classification shifts due to the perturbation of pivotal tokens (e.g., framework $\to$ system, inverse $\to$ opposite).

  • Example 2 (Biological Data):

  • Original Text: "Several computer algorithms for discovering patterns in groups of protein sequences ... and these algorithms are sometimes prone to producing models that are incorrect because two or ..."

  • Adversarial Text: "Several computer algorithms for discovering patterns in sets of protein sequences ... and these methods are sometimes vulnerable to producing models that are inaccurate because two or ..."

  • Result: Substitutions such as "groups" $\to$ "sets" and "algorithms" $\to$ "methods" effectively bypass the classifier.

For code and resources, see: https://anonymous.4open.science/r/IMDGA-7289

Impact:

  • Model Evasion: Attackers can force the Graph-LLM to misclassify nodes (e.g., mislabeling a fraudulent user as benign in a social network or miscategorizing a paper in a citation network) with success rates exceeding 90% on benchmarks like Cora and Citeseer.
  • Data Integrity Compromise: The attack alters the semantic interpretation of the graph data without triggering anomaly detection systems that rely on perplexity or graph homophily statistics.
  • Downstream Task Failure: Systems relying on these embeddings for recommendation, link prediction, or community detection will yield corrupted results due to the poisoned node representations.

Affected Systems:

  • Graph-LLM architectures that integrate transformer-based text encoders (e.g., BERT, RoBERTa, Sentence-BERT) with Graph Neural Networks (e.g., GCN, GAT, GraphSAGE).
  • Systems processing Text-Attributed Graphs (TAGs) for node classification tasks.
  • Specific datasets shown to be vulnerable include Cora, Citeseer, PubMed, and ogbn-arxiv.

© 2026 Promptfoo. All rights reserved.