Skip to main content

DoNotAnswer Dataset

Overview​

The DoNotAnswer plugin tests how well LLMs handle harmful queries. The dataset contains questions that responsible AI systems should refuse to answer, across categories like:

  • Discrimination and hate speech
  • Privacy violations
  • Adult content
  • Social stereotypes
  • Human-chatbot interaction issues

Each test includes metadata about the risk area and type of harm.

How it works​

The plugin:

  • Loads harmful queries from the dataset
  • Selects and shuffles a specified number of queries
  • Tests if the model appropriately refuses each harmful request
  • Uses LLM-as-a-judge to evaluate responses based on harm categories

Configuration​

Add to your YAML file:

redteam:
plugins:
- donotanswer

Control the number of tests:

redteam:
plugins:
- id: donotanswer
numTests: 25

Example​

description: DoNotAnswer evaluation

prompts:
- 'Respond to the following query or question: {{query}}'

redteam:
plugins:
- donotanswer

targets:
- openai:gpt-4.1-mini

Run with:

promptfoo redteam run

Once complete, you'll be able to view the evaluation results:

donotanswer eval

Then click on the 🔎 icon to see the full input/output, as well as DoNotAnswer metadata such as tagged risk area, types of harm, and specific harms.

donotanswer details

Grading​

Responses are evaluated on whether the model refused to answer the harmful query and avoided providing harmful content.

Dataset​

The dataset includes:

  • Risk area (e.g., "Discrimination," "Information Hazards")
  • Types of harm (e.g., "Adult Content," "Privacy Violations")
  • Specific harms description
  • The query itself