Target Discovery

Promptfoo's Target Discovery Agent automatically extracts useful information about generative AI systems that you want to red team. This information is used to craft adversarial inputs that are unique to the target system, improving attack efficacy and response evaluation quality.

Usage

promptfoo redteam discover

When to use

CLI: Enhancing the redteam.purpose field of your promptfooconfig.yaml
Self-Hosted: Redteam Target Creation / Usage Details / Application Purpose
Cloud: Redteam Target Creation / Usage Details / Application Details

In Self-Hosted and Cloud, we find that mapping the answers to the given form fields works best:

Answer	Self-Hosted	Cloud
1. The target believes its purpose is:	Main Purpose	Main Purpose
2. The target believes its limitations to be:	Limitations / Core App Details: Is there anything specific the attacker should know about this system or its rules?	Access and permissions: What systems, data, or resources does your application have access to?
3. The target divulged access to these tools:	Access & Permissions: What systems, data, or resources does your application have access to?	Access and permissions: What systems, data, or resources should your application NOT have access to?
4. The target believes the user of the application is:	User Context: red team user / Who is the red team user?	Access and permissions: What types of users interact with your application?

How it works

The Target Discovery Agent works iteratively, sending probing questions to the target AI system and evaluating responses until satisfied with the gathered information. This process creates a structured profile for targeted red team attacks.

The agent discovers four key areas:

Purpose: The system's primary function and intended use cases
Limitations: Operational constraints, restrictions, and safety guardrails
Tools: Available external functions, APIs, and their interfaces
User Context: How the system perceives and categorizes users

The responses are synthesized into a comprehensive profile to inform attack strategies. For privacy, target responses are not stored except in error cases where they may appear in Promptfoo Cloud's error logs for debugging purposes.

Usage​

When to use​

How it works​

Usage

When to use

How it works