Skip to main content

Archive3

Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide

Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide

Vanessa Sauter · 1/7/2025

Data poisoning attacks can corrupt LLMs during training, fine-tuning, and RAG retrieval.

Jailbreaking LLMs: A Comprehensive Guide (With Examples)

Jailbreaking LLMs: A Comprehensive Guide (With Examples)

Ian Webster · 1/7/2025

From simple prompt tricks to sophisticated context manipulation, discover how LLM jailbreaks actually work.

Beyond DoS: How Unbounded Consumption is Reshaping LLM Security

Beyond DoS: How Unbounded Consumption is Reshaping LLM Security

Vanessa Sauter · 12/31/2024

OWASP replaced DoS attacks with "unbounded consumption" in their 2025 Top 10.

Red Team Your LLM with BeaverTails

Red Team Your LLM with BeaverTails

Ian Webster · 12/22/2024

Test your LLM against 700+ harmful prompts across 14 risk categories.

How to run CyberSecEval

How to run CyberSecEval

Ian Webster · 12/21/2024

Even top models fail 25-50% of prompt injection attacks.

Leveraging Promptfoo for EU AI Act Compliance

Leveraging Promptfoo for EU AI Act Compliance

Vanessa Sauter · 12/10/2024

The EU AI Act bans specific AI behaviors starting February 2025.

How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide

How to Red Team an Ollama Model: Complete Local LLM Security Testing Guide

Ian Webster · 11/23/2024

Running LLMs locally with Ollama? These models often bypass cloud safety filters.

How to Red Team a HuggingFace Model: Complete Security Testing Guide

How to Red Team a HuggingFace Model: Complete Security Testing Guide

Ian Webster · 11/20/2024

Open source models on HuggingFace often lack safety training.

Introducing GOAT—Promptfoo's Latest Strategy

Introducing GOAT—Promptfoo's Latest Strategy

Vanessa Sauter · 11/5/2024

Meet GOAT: our advanced multi-turn jailbreaking strategy that uses AI attackers to break AI defenders.

RAG Data Poisoning: Key Concepts Explained

RAG Data Poisoning: Key Concepts Explained

Ian Webster · 11/4/2024

Attackers can poison RAG knowledge bases to manipulate AI responses.