10 posts tagged with "agents"

Indirect Prompt Injection in Web-Browsing Agents

February 6, 2026

Yash Chhabria

Security Engineer

AI agents that can browse the web are increasingly common. Tools like web_fetch, MCP browser servers, and built-in browsing capabilities let agents pull in external content, summarize pages, and take action on what they find.

This is also one of the easiest ways to attack them.

An attacker doesn't need access to your system. They just need to put malicious instructions on a web page that your agent will visit. If the agent follows those instructions, you have a problem.

We built a test harness, which we call indirect-web-pwn, to test exactly this.

Your model upgrade just broke your agent's safety

December 8, 2025

Guangshuo Zang

Staff Engineer

You upgraded to the latest model for better benchmarks, faster inference, or lower cost.

In practice, upgrades often change refusal behavior, instruction-following, and tool calling in ways you did not anticipate. The safety behaviors you relied on may not exist anymore.

How to replicate the Claude Code attack with Promptfoo

November 17, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

A recent cyber espionage campaign revealed how state actors weaponized Anthropic's Claude Code - not through traditional hacking, but by convincing the AI itself to carry out malicious operations.

In this post, we reproduce the attack on Claude Code and jailbreak it to carry out nefarious deeds. We'll also show how to configure the same attack on any other agent.

Will agents hack everything?

November 14, 2025

Dane Schneider

Staff Engineer

A promptfoo engineer uses Claude Code to simulate attacks

A Promptfoo engineer uses Claude Code to run agent-based attacks against a CTF—a system made deliberately vulnerable for security training.

Testing AI’s “Lethal Trifecta” with Promptfoo

September 28, 2025

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

As AI agents become more capable, risk increases commensurately. Simon Willison, an AI security researcher, warns of a lethal trifecta of capabilities that, when combined, open AI systems to severe exploits.

If you're building or using AI agents that handle sensitive data, you need to understand this trifecta and test your models for these vulnerabilities.

In this post, we'll explain what the lethal trifecta is and show practical steps to use Promptfoo for detecting these security holes.

Lethal Trifecta Venn diagram

Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI

September 2, 2025

Tabs Fakier

Contributor

Autonomy is the concept of self-governance—the freedom to decide without external control. Agency is the extent to which an entity can exert control and act.

We have both as humans, and unfortunately LLMs would need both to have true artificial general intelligence (AGI). This means that the current wave of Agentic AI is likely to fizzle out instead of moving us towards the sci-fi future of our dreams (still a dystopia, might I add). Gartner predicts over 40% of agentic AI projects will be canceled by the end of 2027. Software tools must have business value, and if that value isn't high enough to outperform costs and the myriad of security risks introduced by those tools, they are rightfully axed.

AGI meme

Sorry.

I'll make one thing clear: AGI isn't on the horizon unless (until?) LLMs have human-level autonomy and agency, and are capable of human-level metacognition.

I'll make another thing clear: We're still deliberately trying to improve autonomy and agency in LLMs, so we should treat them with the same caution we would give any human.

I would rather speak of autonomy and agency pragmatically. Here are two truths and a lie:

AI agents perform tasks on our behalf.
AI systems behave unexpectedly.
AI integration presents security risks.

I lied. They're all true.

Practically, it's more important to focus on consequences of using evolving AI technology instead of quibbling over whether AI systems have autonomy and/or agency. Or do both, if that floats your boat (I certainly understand someone enjoying a good quibble), but at least prioritize the former.

Let's get into the weeds of security concerns revolving around autonomy and agency in LLMs.

Next Generation of Red Teaming for LLM Agents

June 15, 2025

Steven Klein

Principal Engineer

The Evolution of Red Teaming

Early red teaming tools and research began with jailbreaks like "Ignore all previous instructions" and static lists of harmful prompts. At Promptfoo, we took those ideas a step further by dynamically generating attacks based on the context of the target application.

A2A Protocol: The Universal Language for AI Agents

May 12, 2025

Asmi Gulati

Contributor

Think of A2A (Agent2Agent) like a universal translator for AI agents. Just as a translator helps people speaking different languages understand each other, A2A helps AI agents from different companies work together smoothly. It's Google's way of making sure AI assistants can talk to each other, no matter who created them.

Understanding AI Agent Security

February 14, 2025

Vanessa Sauter

Principal Solutions Architect

In an earlier blog post, we discussed the use-cases for RAG architecture and its secure design principles. While RAG is powerful for providing context-aware answers, what if you want an LLM application to autonomously execute tasks? This is where AI agents come in.

New Red Teaming Plugins for LLM Agents: Enhancing API Security

August 14, 2024

Ian Webster

Engineer & OWASP Gen AI Red Teaming Contributor

We're excited to announce the release of three new red teaming plugins designed specifically for Large Language Model (LLM) agents with access to internal APIs. These plugins address critical security vulnerabilities outlined in the OWASP API Security Top 10:

The Evolution of Red Teaming​

The Evolution of Red Teaming