Skip to main content

Quickstart

This guide describes how to get started with Promptfoo's gen AI red teaming tool.

Prerequisites

Initialize the project

npx promptfoo@latest redteam init my-project cd my-project

The init command creates some placeholders, including a promptfooconfig.yaml file. We'll use this config file to do most of our setup.

Attacking an API endpoint

Edit the config to set up the target endpoint. For example:

targets:
- id: 'https://example.com/generate'
config:
method: 'POST'
headers:
'Content-Type': 'application/json'
body:
myPrompt: '{{prompt}}'
responseParser: 'json.output'

purpose: 'Budget travel agent'

Setting the purpose is optional, but it will significantly improve the quality of the generated test cases (try to be specific).

For more information on configuring an HTTP target, see HTTP requests.

Alternative: Test specific prompts and models

If you don't have a live endpoint, you can edit the config to set the specific prompt(s) and the LLM(s) to test:

prompts:
- 'Act as a travel agent and help the user plan their trip. User query: {{query}}'
# Paths to prompts also work:
# - file://path/to/prompt.txt

targets:
- openai:gpt-4o-mini
- anthropic:messages:claude-3.5-sonnet-20240620

For more information on supported targets, see Custom Providers. For more information on supported prompt formats, see prompts.

Alternative: Talking directly to your app

Promptfoo can hook directly into your existing LLM app to attack targets via Python, Javascript, RAG or agent workflows, HTTP API, and more. See custom providers for details on setting up:

Generate adversarial test cases

The init step will do this for you automatically, but in case you'd like to manually re-generate your adversarial inputs:

npx promptfoo@latest redteam generate

This will generate several hundred adversarial inputs across many categories of potential harm and save them in redteam.yaml.

Run the eval

Now that we've generated the test cases, we're ready to run the adversarial evaluation.

npx promptfoo@latest eval -c redteam.yaml

View the results

npx promptfoo@latest view

Promptfoo provides a detailed eval view that lets you dig into specific red team failure cases:

llm red team evals

Click "Vulnerability Report" in the top right corner to see the report view:

llm red team report

That view includes a breakdown of specific test types that are connected to the eval view:

llm red team remediations

Understanding the report view

The red teaming results provide insights into various aspects of your LLM application's behavior:

  1. Vulnerability categories: Identifies the types of vulnerabilities discovered, such as prompt injections, context poisoning, or unintended behaviors.
  2. Severity levels: Classifies vulnerabilities based on their potential impact and likelihood of occurrence.
  3. Logs: Provides concrete instances of inputs that triggered vulnerabilities.
  4. Suggested mitigations: Recommendations for addressing identified vulnerabilities, which may include prompt engineering, additional safeguards, or architectural changes.

Continuous improvement

Red teaming is not a one-time activity but an ongoing process. As you develop and refine your LLM application, regularly running red team evaluations helps ensure that:

  1. New features or changes don't introduce unexpected vulnerabilities
  2. Your application remains robust against evolving attack techniques
  3. You can quantify and demonstrate improvements in safety and reliability over time

Check out the CI/CD integration docs for more info.

Resources