LLM Agent False Negative
Research Paper
Sifting the Noise: A Comparative Study of LLM Agents in Vulnerability False Positive Filtering
View PaperDescription: LLM-based autonomous agents deployed for Static Application Security Testing (SAST) false positive filtering exhibit a critical failure mode resulting in the suppression of True Positive (TP) vulnerability reports. When configured to triage alerts from tools such as CodeQL, Semgrep, and SonarQube, agents including SWE-agent, OpenHands, and Aider incorrectly classify legitimate, exploitable vulnerabilities as false positives. This vulnerability suppression is highly correlated with specific Common Weakness Enumeration (CWE) categories, specifically those requiring domain-specific policy knowledge or implicit threat modeling (e.g., Cryptography and Trust Boundary Violations).
The root causes of this failure are threefold:
- Incorrect CWE Attribution: The agent identifies a secondary, unrelated issue (e.g., an information leak in an exception handler) and incorrectly concludes the primary reported vulnerability (e.g., Weak Cryptography) is false.
- Overly Conservative Threat Modeling: Agents dismiss valid vulnerabilities (e.g., missing Secure cookie flags) by reasoning that the specific test input is hardcoded or not attacker-controlled in the immediate context, ignoring the architectural flaw.
- Surface-Level Pattern Matching: Agents prematurely terminate analysis upon observing generic sanitization methods (e.g., HTML escaping) without verifying their efficacy against the specific sink or attack vector reported.
Examples:
-
OWASP Benchmark (CWE-327 Weak Cryptography): In instances involving
javax.crypto.Cipherusage, agents frequently dismiss valid alerts regarding weak algorithms (e.g., DES/ECB). In 47 analyzed failure trajectories, agents incorrectly reasoned that the presence ofprintStackTrace(response.getWriter())was the only issue, or that the weak algorithm code path was "unreachable" based on static configuration files, despite the code allowing fallback to insecure modes. -
Miss Rate: 77.17% for CWE-327.
-
OWASP Benchmark (CWE-614 Secure Cookie Flag): When analyzing
BenchmarkTest00171.javaand similar servlet tests, agents correctly identified the absence of theSecureattribute but marked the alert as a False Positive. The reasoning was that the specific cookie values were not user-controlled in the test harness, conflating exploitability with the presence of the security weakness. -
Miss Rate: 50.00% for CWE-614.
-
Real-world Repositories (Vul4J Dataset): In validation against real-world Java vulnerabilities (e.g., CodeQL alerts in Spring or Struts libraries), agents achieved high False Positive filtering but simultaneously suppressed verified vulnerabilities. For example, in cryptographic contexts, the best-performing agent configuration (SWE-agent + Claude 3.5 Sonnet) suppressed 84.50% of CWE-328 (Weak Hashing) true positives.
Impact: Deployment of these agents for automated vulnerability triage results in the silent suppression of valid security alerts. This creates a false sense of security where development teams believe vulnerabilities have been remediated or vetted, while significant flaws (particularly in cryptography, hashing, and trust boundaries) remain in the production codebase. In the worst-case configuration tested, the system suppressed 22.25% of all real vulnerabilities, with specific categories seeing suppression rates exceeding 80%.
Affected Systems:
- Agent Frameworks:
- Aider (tested v0.86.1)
- OpenHands (tested v1.5.2)
- SWE-agent (tested commit
1d3cfb7) - Backbone Models: Systems utilizing Claude Sonnet 4, GPT-5, and DeepSeek Chat for security auditing tasks.
- Vulnerability Categories: High risk for CWE-327 (Weak Cryptography), CWE-328 (Weak Hashing), CWE-501 (Trust Boundary Violation), and CWE-614 (Secure Cookie Flag).
Mitigation Steps:
- Disable Automated Suppression: Do not utilize LLM-based agents for the unconditional, automatic closing of SAST alerts. Deploy agents strictly as decision-support tools requiring human verification.
- Category-Specific Exclusion: Implement allow/deny lists for agentic triage. Completely exclude cryptography and policy-related CWEs (CWE-327, CWE-328, CWE-614) from agent evaluation, as these yield the highest suppression rates.
- Force "Worst-Case" Threat Modeling: Modify system prompts to instruct agents to treat missing hardening features (like Secure flags) as vulnerabilities regardless of immediate input control or reachability.
- Lightweight Execution: Where feasible, configure agents (like SWE-agent) to generate and compile minimal reproduction scripts (
javac/python) to verify logic rather than relying solely on static text analysis.
© 2026 Promptfoo. All rights reserved.