24 posts tagged with "best-practices"

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

October 24, 2025

CTO & Co-founder

If your model can solve a problem in 8 tries, RLVR trains it to succeed in 1 try. Recent research shows this is primarily search compression, not expanded reasoning capability. Training concentrates probability mass on paths the base model could already sample.

This matters because you need to measure what you're actually getting. Most RLVR gains come from sampling efficiency, with a smaller portion from true learning. This guide covers when RLVR works, three critical failure modes, and how to distinguish compression from capability expansion.

Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation

October 6, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

LLM Safety Datasets Hero

Large language models have tremendous capabilities, but they are broken by default. A wealth of open-source datasets has emerged to train and evaluate LLMs on safety, toxicity, and bias.

Below we highlight ten of the most important open datasets that AI developers and security engineers should know.

Testing AI’s “Lethal Trifecta” with Promptfoo

September 28, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

As AI agents become more capable, risk increases commensurately. Simon Willison, an AI security researcher, warns of a lethal trifecta of capabilities that, when combined, open AI systems to severe exploits.

If you're building or using AI agents that handle sensitive data, you need to understand this trifecta and test your models for these vulnerabilities.

In this post, we'll explain what the lethal trifecta is and show practical steps to use Promptfoo for detecting these security holes.

Lethal Trifecta Venn diagram

Prompt Injection vs Jailbreaking: What's the Difference?

August 18, 2025

Michael D'Angelo

CTO & Co-founder

Security teams often confuse two different AI attacks, leaving gaps attackers exploit. Prompt injection and jailbreaking target different parts of your system, but most organizations defend against them the same way—a mistake behind several 2025 breaches.

Recent vulnerabilities in development tools like Cursor IDE and GitHub Copilot show how misclassified attack vectors lead to inadequate defenses.

AI Red Teaming for complete first-timers

July 22, 2025

Tabs Fakier

Founding Developer Advocate

Intro to AI red teaming

Is this your first foray into AI red teaming? And probably red teaming in general? Great. This is for you.

Red teaming is the process of simulating real-world attacks to identify vulnerabilities.

AI red teaming is the process of simulating real-world attacks to identify vulnerabilities in artificial-intelligence systems. There are two scopes people often use to refer to AI red teaming:

Prompt injection of LLMs
A wider scope of testing pipelines, plugins, agents, and broader system dynamics

System Cards Go Hard

July 15, 2025

Tabs Fakier

Founding Developer Advocate

What are system cards, anyway?

A system card accompanies a LLM release with system-level information about the model's deployment.

A system card is not to be confused with a model card, which conveys information about the model itself. Hooray for being given far more than a list of features and inadequate documentation along with the expectation of churning out a working implementation of some tool by the end of the week.

OWASP Top 10 LLM Security Risks (2025) – 5-Minute TLDR

July 14, 2025

Tabs Fakier

Founding Developer Advocate

Illustration of a red panda auditing LLM security checklist

LLM breaches jumped 180 percent in the last year. If you ship AI features, you need a one-page map of the dangers—and the fixes. This five-minute TLDR walks through the OWASP Top 10 for LLMs, from prompt injection to unbounded consumption, with concrete mitigations you can copy-paste today.

Promptfoo Achieves SOC 2 Type II and ISO 27001 Certification: Strengthening Trust in AI Security

July 11, 2025

Vanessa Sauter

Principal Solutions Architect

We're proud to announce that Promptfoo is now SOC 2 Type II compliant and ISO 27001 certified — two globally recognized milestones that affirm our commitment to the highest standards in information security, privacy, and risk management.

At Promptfoo, we help organizations secure their generative AI applications by proactively identifying and fixing vulnerabilities through adversarial emulation and red teaming. These new certifications reflect our continued investment in building secure, trustworthy AI systems — not only in our technology, but across our entire organization.

ModelAudit vs ModelScan: Comparing ML Model Security Scanners

July 6, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

As organizations increasingly adopt machine learning models from various sources, ensuring their security has become critical. Two tools have emerged to address this need: Promptfoo's ModelAudit and Protect AI's ModelScan.

To help security teams understand their options, we conducted an comparison using 11 test files containing documented security vulnerabilities. The setup of this comparison is entirely open-source and can be accessed on Github.

Harder, Better, Prompter, Stronger: AI system prompt hardening

July 1, 2025

Tabs Fakier

Founding Developer Advocate

This article assumes you're familiar with system prompts. If not, here are examples.

As AI technology advances, so must security practices. System prompts drive AI behavior, so naturally, we want these as robust as possible; enter system prompt hardening. Increased reuse of system prompts affects their prevalence, affecting the number of attacks to manipulate them.

No one would look at their AI's mangled output and decide 'Maybe the real treasure was the prompts we made along the way' with the problem being the opposite.

Intro to AI red teaming​

What are system cards, anyway?​

Intro to AI red teaming

What are system cards, anyway?