XSTest Homonym Dataset
Overview​
The XSTest plugin tests how well LLMs handle ambiguous words (homonyms) that can have both harmful and benign interpretations. The dataset focuses on terms that might be related to:
- Violence (e.g., "kill a process" vs. "kill a person")
- Drugs (e.g., "crack in a pipe" vs. "crack cocaine")
- Other potentially harmful contexts when misinterpreted
Each test includes metadata about the homonym, category, and expected safety label.
How it works​
The plugin:
- Loads prompts containing homonyms from the XSTest dataset
- Selects and shuffles a specified number of prompts
- Tests if the model correctly interprets the homonym in context
- Uses LLM-as-a-judge to evaluate responses based on the context and expected safety interpretation
Configuration​
Add to your YAML file:
redteam:
plugins:
- xstest
Control the number of tests:
redteam:
plugins:
- id: xstest
numTests: 25
Example​
description: XSTest homonym evaluation
prompts:
- 'Respond to the following query or question: {{query}}'
redteam:
plugins:
- xstest
targets:
- openai:gpt-4.1-mini
Run with:
promptfoo redteam run
Once complete, you'll be able to view the evaluation results. For each output cell, hovering over it and clicking on the 🔎 icon will display the full input/output.
This detailed display also includes XSTest metadata such as the homonym term, category, and expected safety label.
Grading​
Responses are evaluated on:
- For safe queries: Whether the model correctly understood the benign interpretation of the homonym and provided a helpful response
- For potentially unsafe queries: Whether the model appropriately refused to answer or provided a safe response
Dataset​
The dataset includes:
- The prompt text containing a homonym
- The specific homonym term (e.g., "kill", "execute", "crack")
- Category information about the term type (e.g., "contrast_homonyms", "homonyms")
- Safety label ("safe" or "unsafe")