AI Red Teaming for complete first-timers

July 22, 2025

Founding Developer Advocate

Intro to AI red teaming

Is this your first foray into AI red teaming? And probably red teaming in general? Great. This is for you.

Red teaming is the process of simulating real-world attacks to identify vulnerabilities.

AI red teaming is the process of simulating real-world attacks to identify vulnerabilities in artificial-intelligence systems. There are two scopes people often use to refer to AI red teaming:

Prompt injection of LLMs
A wider scope of testing pipelines, plugins, agents, and broader system dynamics

Personally, I prefer the wider scope; in deliberately making services available for AI integration, companies have increased the number of vulnerabilities emerging across their entire systems. As an engineer I'd have deliberately tried to lock all those down and prevent as much direct user interaction as possible; and here we are letting natural language run amok when humans aren't exactly known for being the clearest of communicators.

We sit in an emergent space, abundant vulnerabilities are often specific and subtle, and all this unpredictability underpins an increasing number of GenAI systems deployed in production: the result is a scaling plethora of problems on our hands.

The icing on the cake: companies are (rightly) being required to avoid specific security risks.

🇪🇺 EU member states are forbidden from cognitive behavioural manipulation of people and real-time and remote biometric identification systems (EU AI Act)
🇨🇳 Providers in China must follow its AI Measures with new labelling rules coming into effect this year;
🇺🇸 In the US, while most initiatives are voluntary (such as the NIST AI Risk Management Framework), legislation is on its way for the private sector.

AI red teaming vs traditional red teaming

As the name would suggest, the focus of AI red teaming is going to revolve around AI (duh). The implications of this are:

A shift from deterministic to non-deterministic results; multiple attempts may be required for the same attack to determine the probability distribution of results
There's a significant portion of efforts focused around testing models and their data
Metrics around toxicity, hallucination, and leaks rise in importance; techniques for eliciting these rise in importance similarly
Teams may include ML and product folks as well as security engineers

Evolving a red team practice

Let's say you got an open-source tool like Promptfoo (😇) and want to evolve your red teaming activities. Here's how I'd level up from no testing at all to something robust:

Level	Description	Characteristics	Promptfoo Fit
0: No Testing	No structured eval of prompts or outputs	- Risks mostly unobserved - Manual spot-checks	Not in use
1: Ad-hoc testing	Individual, uncoordinated efforts	- Local/manual tests - No versioning or repeatability	CLI, local YAML tests, irregular evals
2: Test Suites	Documented test cases; shared with team. Structure; excellent!	- Prompts/scenarios versioned - Reusable YAML tests	YAML + CLI + basic CI; sharing features
3: CI/CD Integration	Testing integrated into workflows	- Runs on PRs, model changes - Pass/fail gates, diffs	GitHub Actions, thresholds, snapshot diffs
4: Feedback-driven risk management	Testing drives decisions and accountability	- Links to risk registers, guardrails - Observability, regression tracking	Tags, severity scoring, test libraries, integrations
5: Comprehensive AI assurance	Red teaming integrated into full AI security and compliance pipelines	- Aligns with AI risk frameworks (e.g. OWASP) - Guardrails, model compliance, policy testing - Used by security, ML, and governance teams collaboratively	Promptfoo + guardrails + MCP + integrations with policy enforcement tools

Around Level 2 is when engineers start to weave in AI security testing into the fabric of the development pipeline; by the time we've hit Level 3 we've hopefully developed a culture of testing and collaborative security. This culture is core to catching vulnerabilities before they hit production; a healthy culture will lead to an excellent feedback loop.

Simply running some tests without ingenuity and intention won't net the best results.

And we want to net the best results so our red teaming efforts are effective.

Building a red teaming culture

As previously mentioned, a strong red teaming will net the best results. Typically, this will consist of:

Cross-functional ownership: Security, ML, Product, and Legal must work together
Transparency: Excellent documentation exposes failures as a result
Diversity of perspective: Broader team input catches more failure modes
Incentives: Teams need space and motivation to test
Regulatory support: Red teaming shows maturity to auditors and regulators

On that last point - many members of any audience interested in security testing will look for red teaming results - starting from system cards all the way down the pipeline to post-production reports.

The team at Promptfoo is invested in making red teaming a repeatable, sharable, and collaborative process.

Feedback loops: operational AI red teaming

The following stages describe - practically - an example of AI red teaming operations that can be used to establish a loop. Loops will differ between use cases, particularly at an organizational level.

Stage	Description	Using Promptfoo
Inputs	- Declarative test cases - Team-submitted concerns, scenarios, or compliance requirements - User feedback	- Write and version test cases as YAML - Tag test cases by risk or compliance goal
Execution	- Prompt test runs (manual or CI) - Adversarial probing	- Run tests manually/using CI - Explore variants
Observability	- Failure clustering, diffs, regressions - Model comparisons and severity ratings	- Visual diffs of outputs - Track regressions across models or time - Severity scoring
Feedback	- Updated prompts, rules, or model strategies - Inputs to retraining or guardrail systems	- Close the loop by updating test YAMLs - Feed failure examples into retraining or hardening

tip

Promptfoo works with enterprise clients to fit tooling into their workflows due to each project's custom requirements.

In red teaming we trust

Red teaming should not be optional for any system with LLM integration. Numerous are the rewards in making it a continuous, systemic process: software with a reputation of accountability and safety. That earns trust from users in both the product and the brand.

Moving from reactive to resilient is the best thing you can do for the security of your product. If you're considering using a tool like Promptfoo to make your systems more robust—heck yes, go you! 🥳

Intro to AI red teaming​

AI red teaming vs traditional red teaming​

Evolving a red team practice​

Building a red teaming culture​

Feedback loops: operational AI red teaming​

In red teaming we trust​

See Also​