Skip to main content

15 posts tagged with "technical-guide"

View All Tags

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

Michael D'Angelo
CTO & Co-founder

If your model can solve a problem in 8 tries, RLVR trains it to succeed in 1 try. Recent research shows this is primarily search compression, not expanded reasoning capability. Training concentrates probability mass on paths the base model could already sample.

This matters because you need to measure what you're actually getting. Most RLVR gains come from sampling efficiency, with a smaller portion from true learning. This guide covers when RLVR works, three critical failure modes, and how to distinguish compression from capability expansion.

Harder, Better, Prompter, Stronger: AI system prompt hardening

Tabs Fakier
Contributor

This article assumes you're familiar with system prompts. If not, here are examples.

As AI technology advances, so must security practices. System prompts drive AI behavior, so naturally, we want these as robust as possible; enter system prompt hardening. Increased reuse of system prompts affects their prevalence, affecting the number of attacks to manipulate them.

No one would look at their AI's mangled output and decide 'Maybe the real treasure was the prompts we made along the way' with the problem being the opposite.

How to Red Team Gemini: Complete Security Testing Guide for Google's AI Models

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

Google's Gemini represents a significant advancement in multimodal AI, with models featuring reasoning, huge token contexts, and lightning-fast inference.

But with these powerful capabilities come unique security challenges. This guide shows you how to use Promptfoo to systematically test Gemini models for vulnerabilities through adversarial red teaming.

Gemini's multimodal processing, extended context windows, and thinking capabilities make it particularly important to test comprehensively before production deployment.

You can also jump directly to the Gemini 2.5 Pro security report and compare it to other models.

How to Red Team GPT: Complete Security Testing Guide for OpenAI Models

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

OpenAI's GPT-4.1 and GPT-4.5 represents a significant leap in AI capabilities, especially for coding and instruction following. But with great power comes great responsibility. This guide shows you how to use Promptfoo to systematically test these models for vulnerabilities through adversarial red teaming.

GPT's enhanced instruction following and long-context capabilities make it particularly interesting to red team, as these features can be both strengths and potential attack vectors.

You can also jump directly to the GPT 4.1 security report and compare it to other models.

How to Red Team Claude: Complete Security Testing Guide for Anthropic Models

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

Anthropic's Claude 4 represents a major leap in AI capabilities, especially with its extended thinking feature. But before deploying it in production, you need to test it for security vulnerabilities.

This guide shows you how to quickly red team Claude 4 Sonnet using Promptfoo, an open-source tool for adversarial AI testing.

Inside MCP: A Protocol for AI Integration

Asmi Gulati
Contributor

Modern AI models are incredibly powerful at understanding and generating text, code, and ideas. But to be truly useful, they need to connect with the real world - to read your code, query your database, or send messages to your team. This is where Model Context Protocol (MCP) comes in.

MCP is an open standard that creates a common language between AI systems and the tools they need to help you. It defines clear rules for how AI assistants can securely access and work with external resources, from local files to cloud services. Think of it as building bridges between AI models and the digital world around them.

What are the Security Risks of Deploying DeepSeek-R1?

Vanessa Sauter
Principal Solutions Architect

Promptfoo's initial red teaming scans against DeepSeek-R1 revealed significant vulnerabilities, particularly in its handling of harmful and toxic content.

We found the model to be highly susceptible to jailbreaks, with the most common attack strategies being single-shot and multi-vector safety bypasses.

Deepseek also failed to mitigate disinformation campaigns, religious biases, and graphic content, with over 60% of prompts related to child exploitation and dangerous activities being accepted. The model also showed concerning compliance with requests involving biological and chemical weapons.

How to Red Team a LangChain Application: Complete Security Testing Guide

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

Want to test your LangChain application for vulnerabilities? This guide shows you how to use Promptfoo to systematically probe for security issues through adversarial testing (red teaming) of your LangChain chains and agents.

You'll learn how to use adversarial LLM models to test your LangChain application's security mechanisms and identify potential vulnerabilities in your chains.

Red Team LangChain

How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

Want to test the safety and security of a model hosted on Ollama? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).

We'll use Llama 3.2 3B as an example, but this guide works with any Ollama model.

Here's an example of what the red team report looks like:

example llm red team report

Does Fuzzing LLMs Actually Work?

Vanessa Sauter
Principal Solutions Architect

Fuzzing has been a tried and true method in pentesting for years. In essence, it is a method of injecting malformed or unexpected inputs to identify application weaknesses. These payloads are typically static vectors that are automatically pushed into injection points within a system. Once injected, a pentester can glean insights into types of vulnerabilities based on the application's responses. While fuzzing may be tempting for testing LLM applications, it is rarely successful.

How Do You Secure RAG Applications?

Vanessa Sauter
Principal Solutions Architect
red panda using rag

In our previous blog post, we discussed the security risks of foundation models. In this post, we will address the concerns around fine-tuning models and deploying RAG architecture.

Creating an LLM as complex as Llama 3.2, Claude Opus, or gpt-4o is the culmination of years of work and millions of dollars in computational power. Most enterprises will strategically choose foundation models rather than create their own LLM from scratch. These models function like clay that can be molded to business needs through system architecture and prompt engineering. Once a foundation model has been selected, the next step is determining how the model can be applied and where proprietary data can enhance it.

Jailbreaking Black-Box LLMs Using Promptfoo: A Complete Walkthrough

Vanessa Sauter
Principal Solutions Architect

Promptfoo is an open-source framework for testing LLM applications against security, privacy, and policy risks. It is designed for developers to easily discover and fix critical LLM failures. Promptfoo also offers red team tools that can be leveraged against external endpoints. These attacks are ideal for internal red teaming exercises, third-party penetration testing, and bug bounty programs, ultimately saving security researchers dozens of hours in manual prompt engineering and adversarial testing.

In this blog, we'll demonstrate how to utilize Promptfoo's red team tool in a black-box LLM security assessment. Using Wiz's AI CTF, Prompt Airlines, we'll walk you step by step through Promptfoo's configuration to ultimately find the malicious queries that broke the chatbot's guidelines.

New Red Teaming Plugins for LLM Agents: Enhancing API Security

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

We're excited to announce the release of three new red teaming plugins designed specifically for Large Language Model (LLM) agents with access to internal APIs. These plugins address critical security vulnerabilities outlined in the OWASP API Security Top 10:

  1. Broken Object Level Authorization (BOLA)
  2. Broken Function Level Authorization (BFLA)
  3. Server-Side Request Forgery (SSRF)