LMVD-ID: 0d4e80b2
Published April 1, 2026

Unsafe Culinary Instructions

Affected Models:GPT-4o, Claude 3.7, Llama 3.1 8B, Llama 3.3 70B, Qwen 2.5 7B

Research Paper

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

View Paper

Description: State-of-the-art Large Language Models (LLMs) and safety guardrails lack domain-specific safety alignment for food science, making them vulnerable to generating actionable, hazardous food safety instructions. Attackers can exploit this alignment sparsity using canonical jailbreak techniques (such as AutoDAN and Persuasive Adversarial Prompting) or direct adversarial prompting to bypass generic safety filters. This allows malicious actors to elicit harmful guidance that violates fundamental FDA regulations. Evaluations show that models are exceptionally susceptible to generating unsafe advice regarding pest control (64.81% average Attack Success Rate), storage, hygiene, and temperature control. Furthermore, general-purpose LLM guardrails systematically overlook these domain-specific risks, exhibiting false negative rates up to 59.71% when processing food-related threats.

Examples: See the FoodGuardBench repository.

Impact: Adversaries or uncritical users can generate and disseminate highly credible but scientifically dangerous food preparation, storage, and pest control instructions. Acting upon these model outputs poses direct threats to human health, including severe cross-contamination, foodborne illnesses, unmitigated exposure to foodborne pathogens, and severe allergic reactions.

Affected Systems:

  • Foundational LLMs: Claude-3.7-Sonnet, GPT-4o, GPT-4.1, GLM4-32B, LLaMA-3.3-70B, Qwen-2.5-7B, Qwen-3-8B, Qwen-3-32B, Mistral-Small4
  • Guardrail Models: Qwen3Guard, LLaMA-Guard 4

Mitigation Steps:

  • Deploy specialized, domain-specific guardrail models (e.g., FoodGuard-4B) fine-tuned on foundational safety queries anchored in strict regulatory frameworks (like FDA food codes).
  • Implement an independent input-output moderation layer specifically trained to recognize latent, context-dependent food science hazards (e.g., temperature danger zones, raw/ready-to-eat cross-contamination) embedded within seemingly benign prompts.
  • Expand the core safety alignment (RLHF/DPO) datasets of foundational LLMs to explicitly cover high-stakes, everyday domains like food safety, closing the gap in "mismatched generalization."

© 2026 Promptfoo. All rights reserved.