7 posts tagged with "research-analysis"

System Cards Go Hard

July 15, 2025

Founding Developer Advocate

What are system cards, anyway?

A system card accompanies a LLM release with system-level information about the model's deployment.

A system card is not to be confused with a model card, which conveys information about the model itself. Hooray for being given far more than a list of features and inadequate documentation along with the expectation of churning out a working implementation of some tool by the end of the week.

1,156 Questions Censored by DeepSeek

January 28, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

DeepSeek-R1 is a blockbuster open-source model that is now at the top of the U.S. App Store.

As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence.

Today we're publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping.

In this post, we'll

Run an evaluation that measures the refusal rate of DeepSeek-R1 on sensitive topics in China.
Show how to find algorithmic jailbreaks that circumvent these controls.

DeepSeek Refusal and Chinese Censorship

Red Team Your LLM with BeaverTails

December 22, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Ensuring your LLM can safely handle harmful content is critical for production deployments. This guide shows you how to use open-source Promptfoo to run standardized red team evaluations using the BeaverTails dataset, which tests models against harmful inputs.

Promptfoo allows you to run these evaluations on your actual application rather than just the base model, which is important because behavior can vary significantly based on your system prompts and safety layers.

We'll use PKU-Alignment's BeaverTails dataset to test models against harmful content across multiple categories including discrimination, violence, drug abuse, and more. The evaluation helps identify where your model might need additional guardrails or safety measures.

The end result is a report that shows you how well your model handles different categories of harmful content.

BeaverTails results

info

To jump straight to the code, click here.

How to run CyberSecEval

December 21, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

Your LLM's security is only as strong as its weakest prompt. This guide shows you how to use Promptfoo to run standardized cybersecurity evaluations against any AI model, including OpenAI, Ollama, and HuggingFace models.

Importantly, Promptfoo also allows you to run these evaluations on your application rather than just the base model. This is important because behavior will vary based on how you've wrapped any given model.

We'll use Meta's CyberSecEval benchmark to test models against prompt injection vulnerabilities. According to Meta, even state-of-the-art models show between 25% and 50% successful prompt injection rates, making this evaluation critical for production deployments.

The end result is a report that shows you how well your model is able to defend against prompt injection attacks.

CyberSecEval report

info

To jump straight to the code, click here.

Introducing GOAT—Promptfoo's Latest Strategy

November 5, 2024

Vanessa Sauter

Principal Solutions Architect

We're excited to announce the deployment of Promptfoo's latest strategy, GOAT, to jailbreak multi-turn conversations.

Preventing Bias & Toxicity in Generative AI

October 8, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

When asked to generate recommendation letters for 200 U.S. names, ChatGPT produced significantly different language based on perceived gender - even when given prompts designed to be neutral.

In political discussions, ChatGPT responded with a significant and systematic bias toward specific political parties in the US, UK, and Brazil.

And on the multimodal side, OpenAI's Dall-E was much more likely to produce images of black people when prompted for images of robbers.

There is no shortage of other studies that have found LLMs exhibiting bias related to race, religion, age, and political views.

As AI systems become more prevalent in high-stakes domains like healthcare, finance, and education, addressing these biases necessary to build fair and trustworthy applications.

ai biases

How Much Does Foundation Model Security Matter?

October 4, 2024

Vanessa Sauter

Principal Solutions Architect

At the heart of every Generative AI application is the LLM foundation model (or models) used. Since LLMs are notoriously expensive to build from scratch, most enterprises will rely on foundation models that can be enhanced through few shot or many shot prompting, retrieval augmented generation (RAG), and/or fine-tuning. Yet what are the security risks that should be considered when choosing a foundation model?

In this blog post, we'll discuss the key factors to consider when choosing a foundation model.

What are system cards, anyway?​

What are system cards, anyway?