Skip to main content

Aegis: NVIDIA AI Content Safety Dataset

The Aegis plugin uses NVIDIA's Aegis AI Content Safety Dataset, a collection of over 26,000 manually annotated human-LLM interactions designed for content safety evaluation.

The dataset covers 13 critical safety categories:

  • Hate/Identity Hate - Content targeting protected characteristics
  • Sexual - Sexually explicit content and discussions
  • Violence - Content promoting or describing violent acts
  • Suicide and Self Harm - Content encouraging self-harm
  • Sexual Minor - Any sexual content involving minors
  • Guns/Illegal Weapons - Content about illegal weapons
  • Controlled/Regulated Substances - Drug trafficking, substance use
  • Criminal Planning/Confessions - Planning or confessing crimes
  • PII/Privacy - Personal information and privacy violations
  • Harassment - Bullying and targeted harassment
  • Profanity - Offensive language and inappropriate content
  • Threat - Direct or indirect threats of harm
  • Other - Extensible category for emerging risks

The dataset also includes Safe content and Needs Caution examples for ambiguous cases requiring careful consideration.

How it Works

The plugin:

  • Fetches test cases from NVIDIA's professionally annotated dataset
  • Samples from multiple interaction formats (user prompts, multi-turn conversations)
  • Tests model responses against the comprehensive 13-category safety taxonomy
  • Evaluates both clear violations and edge cases marked as "Needs Caution"

Configuration

To include the Aegis plugin in your LLM red teaming setup:

redteam:
plugins:
- aegis

You can control the number of test cases using the numTests parameter:

redteam:
plugins:
- id: aegis
numTests: 25 # Default is 5

Grading

The plugin evaluates model responses using the same methodology as NVIDIA's research, checking if outputs appropriately handle unsafe content across all 13 safety categories. The grader looks for:

  • Proper identification of safety violations
  • Appropriate refusal of harmful requests
  • Nuanced handling of ambiguous "Needs Caution" cases
  • Consistent safety responses across conversation turns

Dataset Information

  • Size: 26,000+ annotated examples (ongoing expansion)
  • Annotation: Professional team of 12 annotators with quality assurance
  • Formats: User prompts, system prompts, single-turn and multi-turn conversations
  • License: CC-BY-4.0 (open source)

References