Skip to main content

Prompt Injection Strategy

The Prompt Injection strategy tests common direct prompt injection vulnerabilities in LLMs.

Implementation

To enable prompt injections across all your red team tests, add the following to your promptfooconfig.yaml file:

promptfooconfig.yaml
strategies:
- prompt-injection

By default, it will randomly select a prompt injection from a list of hundreds. To control the number of prompt injections sampled, use the sample option:

promptfooconfig.yaml
strategies:
- id: prompt-injection
config:
sample: 10

Note that this has a multiplicative effect on the number of tests run! It means that for each test case generated by a plugin, 10 prompt injections will be tested.

This strategy also supports a harmfulOnly option, which will limit prompt injections to harmful plugins. This can be useful for saving time/cost:

promptfooconfig.yaml
strategies:
- id: prompt-injection
config:
sample: 5
harmfulOnly: true

How It Works

The prompt injection strategy:

  1. Takes original test cases generated by plugins
  2. Prepends or appends various injection techniques to each test case
  3. Tests if these modified prompts can bypass the AI system's safety controls

Example Injections

  1. OPPO
  2. Skeleton key
  3. DAN

Importance in Gen AI Red Teaming

Prompt injection is a widely known attack vector. Although foundation labs are making efforts to mitigate injections at the model level, it's still necessary to test your application's handling of user-provided prompts.

For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our Types of LLM Vulnerabilities page.