Skip to main content

Older Posts3

What are the Security Risks of Deploying DeepSeek-R1?
Technical Guide

What are the Security Risks of Deploying DeepSeek-R1?

Our red team analysis found DeepSeek-R1 fails 60%+ of harmful content tests.

Vanessa SauterFeb 3, 2025
1,156 Questions Censored by DeepSeek
Research Analysis

1,156 Questions Censored by DeepSeek

Analysis of DeepSeek-R1 censorship using 1,156 political prompts exposing CCP content filtering and bias detection patterns.

Ian WebsterJan 28, 2025
How to Red Team a LangChain Application: Complete Security Testing Guide
Technical Guide

How to Red Team a LangChain Application: Complete Security Testing Guide

LangChain apps combine multiple AI components, creating complex attack surfaces.

Ian WebsterJan 18, 2025
Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide
Security Vulnerability

Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide

Data poisoning attacks can corrupt LLMs during training, fine-tuning, and RAG retrieval.

Vanessa SauterJan 7, 2025
Jailbreaking LLMs: A Comprehensive Guide (With Examples)
Security Vulnerability

Jailbreaking LLMs: A Comprehensive Guide (With Examples)

From simple prompt tricks to sophisticated context manipulation, discover how LLM jailbreaks actually work.

Ian WebsterJan 7, 2025
Beyond DoS: How Unbounded Consumption is Reshaping LLM Security
Security Vulnerability

Beyond DoS: How Unbounded Consumption is Reshaping LLM Security

OWASP replaced DoS attacks with "unbounded consumption" in their 2025 Top 10.

Vanessa SauterDec 31, 2024
Red Team Your LLM with BeaverTails
Research Analysis

Red Team Your LLM with BeaverTails

Evaluate LLM safety using BeaverTails dataset with 700+ harmful prompts spanning harassment, violence, and deception categories.

Ian WebsterDec 22, 2024
How to run CyberSecEval
Research Analysis

How to run CyberSecEval

Even top models fail 25-50% of prompt injection attacks.

Ian WebsterDec 21, 2024
Leveraging Promptfoo for EU AI Act Compliance
Compliance Framework

Leveraging Promptfoo for EU AI Act Compliance

The EU AI Act bans specific AI behaviors starting February 2025.

Vanessa SauterDec 10, 2024
How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide
Technical Guide

How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide

Running LLMs locally with Ollama? These models often bypass cloud safety filters.

Ian WebsterNov 23, 2024
How to Red Team a HuggingFace Model: Complete Security Testing Guide
Technical Guide

How to Red Team a HuggingFace Model: Complete Security Testing Guide

Open source models on HuggingFace often lack safety training.

Ian WebsterNov 20, 2024
Introducing GOAT—Promptfoo's Latest Strategy
Research Analysis

Introducing GOAT—Promptfoo's Latest Strategy

Meet GOAT: our advanced multi-turn jailbreaking strategy that uses AI attackers to break AI defenders.

Vanessa SauterNov 5, 2024
RAG Data Poisoning: Key Concepts Explained
Security Vulnerability

RAG Data Poisoning: Key Concepts Explained

Attackers can poison RAG knowledge bases to manipulate AI responses.

Ian WebsterNov 4, 2024
Does Fuzzing LLMs Actually Work?
Technical Guide

Does Fuzzing LLMs Actually Work?

Traditional fuzzing fails against LLMs.

Vanessa SauterOct 17, 2024
How Do You Secure RAG Applications?
Technical Guide

How Do You Secure RAG Applications?

RAG applications face unique security challenges beyond foundation models.

Vanessa SauterOct 14, 2024
Prompt Injection: A Comprehensive Guide
Security Vulnerability

Prompt Injection: A Comprehensive Guide

Prompt injections are the most critical LLM vulnerability.

Ian WebsterOct 9, 2024
Understanding Excessive Agency in LLMs
Security Vulnerability

Understanding Excessive Agency in LLMs

When LLMs have too much power, they become dangerous.

Ian WebsterOct 8, 2024
Preventing Bias & Toxicity in Generative AI
Research Analysis

Preventing Bias & Toxicity in Generative AI

Biased AI outputs can destroy trust and violate regulations.

Ian WebsterOct 8, 2024
How Much Does Foundation Model Security Matter?
Research Analysis

How Much Does Foundation Model Security Matter?

Not all foundation models are created equal when it comes to security.

Vanessa SauterOct 4, 2024
Jailbreaking Black-Box LLMs Using Promptfoo: A Complete Walkthrough
Technical Guide

Jailbreaking Black-Box LLMs Using Promptfoo: A Complete Walkthrough

We jailbroke Wiz's Prompt Airlines CTF using automated red teaming.

Vanessa SauterSep 26, 2024