Skip to main content

24 posts tagged with "best-practices"

View All Tags

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

Michael D'Angelo
CTO & Co-founder

If your model can solve a problem in 8 tries, RLVR trains it to succeed in 1 try. Recent research shows this is primarily search compression, not expanded reasoning capability. Training concentrates probability mass on paths the base model could already sample.

This matters because you need to measure what you're actually getting. Most RLVR gains come from sampling efficiency, with a smaller portion from true learning. This guide covers when RLVR works, three critical failure modes, and how to distinguish compression from capability expansion.

Testing AI’s “Lethal Trifecta” with Promptfoo

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

As AI agents become more capable, risk increases commensurately. Simon Willison, an AI security researcher, warns of a lethal trifecta of capabilities that, when combined, open AI systems to severe exploits.

If you're building or using AI agents that handle sensitive data, you need to understand this trifecta and test your models for these vulnerabilities.

In this post, we'll explain what the lethal trifecta is and show practical steps to use Promptfoo for detecting these security holes.

Lethal Trifecta Venn diagram

Prompt Injection vs Jailbreaking: What's the Difference?

Michael D'Angelo
CTO & Co-founder

Security teams often confuse two different AI attacks, leaving gaps attackers exploit. Prompt injection and jailbreaking target different parts of your system, but most organizations defend against them the same way—a mistake behind several 2025 breaches.

Recent vulnerabilities in development tools like Cursor IDE and GitHub Copilot show how misclassified attack vectors lead to inadequate defenses.

AI Red Teaming for complete first-timers

Tabs Fakier
Founding Developer Advocate

Intro to AI red teaming

Is this your first foray into AI red teaming? And probably red teaming in general? Great. This is for you.

Red teaming is the process of simulating real-world attacks to identify vulnerabilities.

AI red teaming is the process of simulating real-world attacks to identify vulnerabilities in artificial-intelligence systems. There are two scopes people often use to refer to AI red teaming:

  • Prompt injection of LLMs
  • A wider scope of testing pipelines, plugins, agents, and broader system dynamics

System Cards Go Hard

Tabs Fakier
Founding Developer Advocate

What are system cards, anyway?

A system card accompanies a LLM release with system-level information about the model's deployment.

A system card is not to be confused with a model card, which conveys information about the model itself. Hooray for being given far more than a list of features and inadequate documentation along with the expectation of churning out a working implementation of some tool by the end of the week.

Promptfoo Achieves SOC 2 Type II and ISO 27001 Certification: Strengthening Trust in AI Security

Vanessa Sauter
Principal Solutions Architect

We're proud to announce that Promptfoo is now SOC 2 Type II compliant and ISO 27001 certified — two globally recognized milestones that affirm our commitment to the highest standards in information security, privacy, and risk management.

At Promptfoo, we help organizations secure their generative AI applications by proactively identifying and fixing vulnerabilities through adversarial emulation and red teaming. These new certifications reflect our continued investment in building secure, trustworthy AI systems — not only in our technology, but across our entire organization.

ModelAudit vs ModelScan: Comparing ML Model Security Scanners

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

As organizations increasingly adopt machine learning models from various sources, ensuring their security has become critical. Two tools have emerged to address this need: Promptfoo's ModelAudit and Protect AI's ModelScan.

To help security teams understand their options, we conducted an comparison using 11 test files containing documented security vulnerabilities. The setup of this comparison is entirely open-source and can be accessed on Github.

Harder, Better, Prompter, Stronger: AI system prompt hardening

Tabs Fakier
Founding Developer Advocate

This article assumes you're familiar with system prompts. If not, here are examples.

As AI technology advances, so must security practices. System prompts drive AI behavior, so naturally, we want these as robust as possible; enter system prompt hardening. Increased reuse of system prompts affects their prevalence, affecting the number of attacks to manipulate them.

No one would look at their AI's mangled output and decide 'Maybe the real treasure was the prompts we made along the way' with the problem being the opposite.