Skip to main content

2 posts tagged with "rag"

View All Tags

RAG Data Poisoning: Key Concepts Explained

Ian Webster
Engineer & OWASP Gen AI Red Teaming Contributor

AI systems are under attack - and this time, it's their knowledge base that's being targeted. A new security threat called data poisoning lets attackers manipulate AI responses by corrupting the very documents these systems rely on for accurate information.

Retrieval-Augmented Generation (RAG) was designed to make AI smarter by connecting language models to external knowledge sources. Instead of relying solely on training data, RAG systems can pull in fresh information to provide current, accurate responses. With over 30% of enterprise AI applications now using RAG, it's become a key component of modern AI architecture.

But this powerful capability has opened a new vulnerability. Through data poisoning, attackers can inject malicious content into knowledge databases, forcing AI systems to generate harmful or incorrect outputs.

Data Poisoning

These attacks are remarkably efficient - research shows that just five carefully crafted documents in a database of millions can successfully manipulate AI responses 90% of the time.

How Do You Secure RAG Applications?

Vanessa Sauter
Principal Solutions Architect
red panda using rag

In our previous blog post, we discussed the security risks of foundation models. In this post, we will address the concerns around fine-tuning models and deploying RAG architecture.

Creating an LLM as complex as Llama 3.2, Claude Opus, or gpt-4o is the culmination of years of work and millions of dollars in computational power. Most enterprises will strategically choose foundation models rather than create their own LLM from scratch. These models function like clay that can be molded to business needs through system architecture and prompt engineering. Once a foundation model has been selected, the next step is determining how the model can be applied and where proprietary data can enhance it.