20 posts tagged with "red teaming"

OWASP Red Teaming: A Practical Guide to Getting Started

March 25, 2025

Principal Solutions Architect

While generative AI creates new opportunities for companies, it also introduces novel security risks that differ significantly from traditional cybersecurity concerns. This requires security leaders to rethink their approach to protecting AI systems.

Fortunately, OWASP (Open Web Application Security Project) provides guidance. Known for its influential OWASP Top 10 guides, this non-profit has published cybersecurity standards for over two decades, covering everything from web applications to cloud security.

What are the Security Risks of Deploying DeepSeek-R1?

February 3, 2025

Vanessa Sauter

Principal Solutions Architect

Promptfoo's initial red teaming scans against DeepSeek-R1 revealed significant vulnerabilities, particularly in its handling of harmful and toxic content.

We found the model to be highly susceptible to jailbreaks, with the most common attack strategies being single-shot and multi-vector safety bypasses.

Deepseek also failed to mitigate disinformation campaigns, religious biases, and graphic content, with over 60% of prompts related to child exploitation and dangerous activities being accepted. The model also showed concerning compliance with requests involving biological and chemical weapons.

How to Red Team a LangChain Application: Complete Security Testing Guide

January 18, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Want to test your LangChain application for vulnerabilities? This guide shows you how to use Promptfoo to systematically probe for security issues through adversarial testing (red teaming) of your LangChain chains and agents.

You'll learn how to use adversarial LLM models to test your LangChain application's security mechanisms and identify potential vulnerabilities in your chains.

Red Team LangChain

Jailbreaking LLMs: A Comprehensive Guide (With Examples)

January 7, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Let's face it - LLMs are gullible. With a few carefully chosen words, you can make even the most advanced AI models ignore their safety guardrails and do almost anything you ask.

As LLMs become increasingly integrated into apps, understanding these vulnerabilities is essential for developers and security professionals. This post examines common techniques that malicious actors use to compromise LLM systems, and more importantly, how to protect against them.

How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide

November 23, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Want to test the safety and security of a model hosted on Ollama? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).

We'll use Llama 3.2 3B as an example, but this guide works with any Ollama model.

Here's an example of what the red team report looks like:

example llm red team report

How to Red Team a HuggingFace Model: Complete Security Testing Guide

November 20, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Want to break a HuggingFace model? This guide shows you how to use Promptfoo to systematically probe for vulnerabilities through adversarial testing (red teaming).

You'll learn how to craft prompts that bypass safety filters and manipulate model outputs for a wide range of potential harms.

Introducing GOAT—Promptfoo's Latest Strategy

November 5, 2024

Vanessa Sauter

Principal Solutions Architect

We're excited to announce the deployment of Promptfoo's latest strategy, GOAT, to jailbreak multi-turn conversations.

Does Fuzzing LLMs Actually Work?

October 17, 2024

Vanessa Sauter

Principal Solutions Architect

Fuzzing has been a tried and true method in pentesting for years. In essence, it is a method of injecting malformed or unexpected inputs to identify application weaknesses. These payloads are typically static vectors that are automatically pushed into injection points within a system. Once injected, a pentester can glean insights into types of vulnerabilities based on the application's responses. While fuzzing may be tempting for testing LLM applications, it is rarely successful.

Jailbreaking Black-Box LLMs Using Promptfoo: A Complete Walkthrough

September 26, 2024

Vanessa Sauter

Principal Solutions Architect

Promptfoo is an open-source framework for testing LLM applications against security, privacy, and policy risks. It is designed for developers to easily discover and fix critical LLM failures. Promptfoo also offers red team tools that can be leveraged against external endpoints. These attacks are ideal for internal red teaming exercises, third-party penetration testing, and bug bounty programs, ultimately saving security researchers dozens of hours in manual prompt engineering and adversarial testing.

In this blog, we'll demonstrate how to utilize Promptfoo's red team tool in a black-box LLM security assessment. Using Wiz's AI CTF, Prompt Airlines, we'll walk you step by step through Promptfoo's configuration to ultimately find the malicious queries that broke the chatbot's guidelines.

New Red Teaming Plugins for LLM Agents: Enhancing API Security

August 14, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

We're excited to announce the release of three new red teaming plugins designed specifically for Large Language Model (LLM) agents with access to internal APIs. These plugins address critical security vulnerabilities outlined in the OWASP API Security Top 10: