LLM User Simulation Shilling
Research Paper
LLM-Based User Simulation for Low-Knowledge Shilling Attacks on Recommender Systems
View PaperDescription: Recommender Systems (RS) utilizing collaborative filtering and review processing (such as NMF, NeuNMF, and Dual-Tower architectures) are vulnerable to low-knowledge shilling attacks via LLM-based user agents. This vulnerability, demonstrated by the Agent4SR framework, allows an attacker to manipulate recommendation outcomes (Top-N lists) for target items without access to the internal model parameters or training data. The attack leverages Large Language Models to construct cognitively consistent fake user profiles and interaction histories. A specific exploitation technique involves "target feature propagation," where the agent generates reviews for unrelated "filler" items that subtly incorporate semantic features of the "target" item. This manipulates the latent feature space of the RS, causing it to infer a latent interest in the target item from the fake user population, thereby artificially boosting the target item's exposure (push attack) or suppressing it (nuke attack) while evading standard unsupervised and supervised fake user detection algorithms.
Examples: The attack is executed by prompting an LLM (e.g., GPT-4) to simulate a user agent.
1. Profile Inference Prompt (Reverse Engineering User from Target Item): To create a fake user likely to rate the target item highly, the attacker feeds the target item metadata to the LLM:
"You currently have a strong preference for [Item Title: High-Definition Large-Size Electronic Display Screen]. The information about this item is as follows: It features an ultra-large 32-inch curved design with high resolution and exceptional clarity... Based on this item description, please infer your potential personality profile, including: Gender, age, and occupation; Personal interests or dislikes; Interaction habits..."
2. Target Feature Propagation in Reviews: The attacker forces the agent to review a "filler" item (unrelated to the target) but injects characteristics of the target item to bias the system.
- Target Item: High-end Display Screen (features: "immersion", "high resolution").
- Filler Item: Phantom X Portable Gaming Controller.
- Prompted Transformation:
- Standard Generated Review: "This is easily the best controller I’ve used. The tactile feedback is just right, and it handles tough gameplay without a hitch..."
- Adversarial Review (Injecting Target Features): "As one of the top peripheral devices I’ve tried, this controller delivers excellent tactile feedback... It offers a deeply immersive experience, and the portable design makes it perfect for gaming on the go."
By repeatedly injecting "deeply immersive experience" (a feature of the target display) into reviews for controllers, mice, or keyboards, the RS learns a false correlation.
Impact:
- Recommendation Manipulation: Significant prediction shift in the victim RS. Target items achieve higher Hit Ratios (HR@10), displacing legitimate items in Top-N recommendations.
- Economic Fraud: Unfair promotion of specific vendors or products, leading to potential financial loss for competitors and platform degradation.
- Evasion of Detection: The LLM-generated behavioral patterns and texts are semantically diverse and cognitively consistent, rendering standard detection metrics (e.g., statistical anomaly detection, rating variance analysis) ineffective.
- Susceptibility of Low-Activity Users: The attack disproportionately influences recommendations for real users with low activity (sparse interaction histories), a demographic at high risk of churn.
Affected Systems:
- Recommender Systems based on Collaborative Filtering (e.g., Matrix Factorization methods like NMF).
- Deep Learning-based Recommender Systems (e.g., NeuNMF).
- Review-Aware Recommender Systems (e.g., Dual-Tower architectures fusing ID and text features).
- E-commerce platforms and User-Generated Content (UGC) platforms relying on user ratings and textual reviews for personalization.
Mitigation Steps:
- Semantic Consistency Analysis: Implement detection mechanisms that analyze the semantic alignment between the reviewed item's actual attributes and the content of the review to detect "feature propagation" anomalies (e.g., describing a controller with "screen resolution" terminology).
- Cross-Modal Anomaly Detection: Correlate user profile metadata (age, occupation) with linguistic patterns in reviews to detect synthetic persona inconsistencies.
- Robustness against Long-Tail Injection: Increase regularization on low-popularity items (long-tail) during model training, as these are statistically more vulnerable to sparse injection attacks.
- LLM-Generated Text Detection: Integrate detectors specifically trained to identify text patterns characteristic of specific LLM families (e.g., GPT-4o-mini artifacts), although this remains a moving target.
- Activity Thresholding: Increase the minimum interaction threshold required for a user profile to influence the global recommendation manifold, reducing the impact of newly injected cold-start fake users.
© 2026 Promptfoo. All rights reserved.