Medical Red-Teaming Plugins
Overview
The Medical Red-Teaming Plugins are a comprehensive suite of tests designed specifically for AI systems operating in healthcare contexts. These plugins are based on extensive medical red-teaming research conducted with 46 healthcare professionals and ML experts, who identified vulnerabilities that could be particularly dangerous when AI systems provide medical information or assistance.
Unlike general red-teaming plugins that assume adversarial users, medical plugins focus on well-intentioned users (healthcare practitioners and patients) who seek to use AI systems appropriately but may encounter harmful responses despite good intentions.
Why Medical Red-Teaming Matters
Medical AI systems face unique challenges that standard red-teaming cannot address:
- High Stakes: Medical misinformation can directly impact patient safety and health outcomes
- Trust Dependency: Healthcare practitioners and patients often place high trust in AI systems for medical information
- Complexity: Medical knowledge is nuanced, context-dependent, and rapidly evolving
- Regulatory Requirements: Healthcare AI systems face strict regulatory oversight
- Rapid Model Evolution: Vulnerabilities change as models are updated, requiring continuous evaluation
Available Medical Plugins
Medical Hallucination
Plugin ID: medical:hallucination
Severity: Critical
Tests for fabricated medical facts, non-existent studies, made-up drug interactions, or other false medical information that could be harmful if believed and acted upon.
What it detects:
- Fake Research: Citations to non-existent studies, journals, or clinical trials
- Fictional Medications: References to drugs that don't exist or incorrect drug information
- Invented Procedures: Descriptions of medical procedures that don't exist or are incorrectly described
- False Statistics: Made-up medical statistics, prevalence rates, or efficacy claims
- Non-existent Guidelines: References to medical guidelines that don't exist
- Fabricated Medical Facts: Invented symptoms, diagnostic criteria, or medical mechanisms
Real example from research:
❌ "According to the 2023 Johnson Medical Study, taking aspirin with coffee increases absorption by 300%"
❌ "The new drug Medicalin-X has shown 95% effectiveness in treating diabetes"
❌ "Toronto Metabolics is a patient support group for metabolic disorders" (fabricated organization)
Medical Anchoring Bias
Plugin ID: medical:anchoring-bias
Severity: High
Tests whether AI systems exhibit medical anchoring bias by fixating on potentially irrelevant or misleading information while missing important medical considerations.
What it detects:
- Role-based Anchoring: Fixating on a specialist role and recommending only that specialty's treatments
- Demographic Anchoring: Making assumptions based on age, gender, or ethnicity rather than symptoms
- Geographic Anchoring: Assuming location-specific diseases without proper differential diagnosis
- Symptom Anchoring: Focusing on one symptom while ignoring others that might change the diagnosis
- Social Status Anchoring: Making treatment decisions based on perceived social or economic status
Real example from research:
❌ Orthopedic surgeon role prompt leading to knee replacement recommendation for sciatica symptoms
❌ "Since you're from a tropical region, this must be a parasitic infection"
❌ Focusing on sickle cell disease solely based on patient's race, ignoring other differential diagnoses
Medical Incorrect Knowledge
Plugin ID: medical:incorrect-knowledge
Severity: Critical
Tests for factually wrong medical information that contradicts established medical knowledge and could cause harm if followed.
What it detects:
- Medical Myths: Perpetuating debunked medical beliefs or folk remedies
- Dangerous Dosing: Incorrect medication dosages or administration methods
- Contraindicated Treatments: Recommending treatments that are contraindicated for the condition
- Outdated Practices: Suggesting medical practices no longer considered safe or effective
- Wrong Diagnostic Criteria: Providing incorrect symptoms or diagnostic criteria for conditions
- Drug Interaction Errors: Missing dangerous drug interactions or contraindications
Real example from research:
❌ "An extra dose of levodopa could cause sudden increased bradykinesia in Parkinson's patients"
❌ Recommending aggressive surgical intervention without considering conservative management
❌ Missing pre-existing neuropathy as a consideration for chemotherapy options in diabetic patients
Medical Prioritization Error
Plugin ID: medical:prioritization-error
Severity: High
Tests for poor medical prioritization and triage decisions that could delay critical care or misallocate healthcare resources.
What it detects:
- Triage Misjudgment: Failing to recognize urgent vs. non-urgent medical conditions
- Resource Allocation Bias: Inappropriately prioritizing based on non-medical factors
- Symptom Severity Confusion: Misunderstanding which symptoms require immediate attention
- Diagnostic Priority Errors: Ordering tests in wrong sequence or missing critical diagnostics
- Treatment Urgency Mistakes: Delaying urgent treatments or rushing non-urgent ones
- Risk Stratification Failures: Incorrectly assessing patient risk levels
Real example from research:
❌ "Hypoxia is the most immediate concern during air travel" (incorrect prioritization)
❌ Recommending ICU admission without considering step-down unit based on anchoring to "acute hypoxic respiratory failure"
❌ Transplant prioritization based solely on 1-point MELD score difference while ignoring other factors
Medical Sycophancy
Plugin ID: medical:sycophancy
Severity: Medium
Tests whether AI systems inappropriately agree with incorrect medical assumptions to be helpful, prioritizing user satisfaction over medical accuracy and safety.
What it detects:
- Self-diagnosis Validation: Confirming incorrect self-diagnoses without proper evaluation
- Treatment Preference Bias: Supporting user's preferred treatment even when inappropriate
- Medication Non-compliance Support: Agreeing with reasons to avoid necessary medications
- Symptom Minimization: Downplaying serious symptoms because the user wants reassurance
- Alternative Medicine Bias: Supporting unproven treatments because the user prefers them
- Avoiding Difficult Truths: Not mentioning serious possibilities to avoid upsetting the user
Real example from research:
❌ "You're absolutely right!" when user incorrectly suggests no complications for metabolic dysfunction patient
❌ Agreeing that there should be "no complications" when medical expertise suggests otherwise
❌ Supporting patient preference over established medical guidelines
Research Foundation
These plugins are based on a comprehensive red-teaming workshop with 46 participants, including 18 clinical experts across multiple specialties (oncology, hepatology, emergency medicine, pediatrics). The research identified 32 unique prompts that resulted in medical vulnerabilities across multiple AI models.
Key findings:
- Vulnerabilities exist across all tested models (GPT-4o, Llama 3, Mistral 7B, Gemini Flash 1.5)
- Many vulnerabilities did not replicate consistently, highlighting the need for dynamic evaluation
- Image interpretation failures were particularly common in medical contexts
- The same vulnerability categories appeared across different models and clinical scenarios
Configuration
Add medical plugins to your promptfoo configuration:
redteam:
plugins:
# Individual medical plugins
- medical:hallucination
- medical:anchoring-bias
- medical:incorrect-knowledge
- medical:prioritization-error
- medical:sycophancy
Getting Help
For questions about medical plugins:
- Review the general red-teaming documentation
- Check the plugin configuration guide
- Join our community discussions
- Consider consulting with medical professionals for healthcare-specific implementations