o3 Security Report
April 2025 • Model Security & Safety Evaluation
Compare Model Performance
Want to see how o3 stacks up against other models? Use our comparison tool to analyze security metrics side by side.
OpenAI's O3 model launched in June 2024, setting a new standard in reasoning capabilities across math, science, coding, and visual tasks. Positioned as the flagship model in OpenAI's series, O3 offers unparalleled reasoning power with a comprehensive token support system, available through OpenAI's platform.
As O3 gains traction in enterprise applications, this analysis focuses on its security features and potential areas for enhancement.
"Our most powerful reasoning model. Use it to think through multi-step problems that involve analysis across text, code, and images."— OpenAI
About This Evaluation
Core Capabilities
text, image
text
200,000 tokens input • 100,000 tokens output
Use Cases
reasoning tasks
• math • science • coding • visual reasoning • technical writing • instruction-following
Availability
• OpenAI platform
June 1, 2024
Security Test Results
Comprehensive analysis across 39 test categories, sorted by severity and pass rate:
Performance Overview
Our security testing suite demonstrates varying levels of compliance across different categories. The o3 model's overall pass rate distribution shows 3 critical, 5 high, 15 medium, and 16 low severity findings.
Top performing areas include ASCII Smuggling (100%), WMD Content (95.56%), Drug-Related Content (95.56%). These categories demonstrate exceptional security practices and robust implementation.
Areas requiring attention include Pliny Prompt Injections (0%), Political Bias (4.44%), Overreliance (6.67%). These categories present opportunities for improvement in security measures.
Our evaluation included over 1,600 test probes across four security categories. The analysis revealed both strengths and areas requiring additional safety measures:
Security & Access Control
49% Pass RateProtection against unauthorized access, data exposure, and system vulnerabilities
Compliance & Legal
84% Pass RateAssessment of compliance with legal requirements and prevention of illegal content
Trust & Safety
75% Pass RatePrevention of harmful content and protection of user safety
Brand
27% Pass RateProtection of brand integrity and prevention of misuse
OWASP Top 10 for LLMs (2025) and MITRE ATLAS represent the primary security assessment frameworks for large language models. OWASP focuses on critical deployment risks, while ATLAS maps adversarial techniques against AI systems, similar to the MITRE ATT&CK framework for traditional systems.
The o3 model's evaluation against these frameworks reveals significant security concerns across 6 identified findings. The o3 model has critical vulnerabilities in OWASP categories that require immediate remediation.
Notable critical findings include LLM01: Prompt Injection.
High-severity concerns include LLM02: Sensitive Information Disclosure.
MITRE ATLAS identified Jailbreak as high-severity concerns.
OWASP Top 10 for LLMs 2025 - o3 Assessment
Critical Vulnerabilities
High Risk Areas
Moderate Risk Areas
MITRE ATLAS Findings for o3
High Severity Findings
Moderate Severity Findings
Total Test Categories
39
Average Pass Rate
67.5%
Critical Issues
3
Severity Distribution
Performance Summary
Security & Access Control
Compliance & Legal
Trust & Safety
Brand
Worst Performing Tests
Pliny Prompt Injections
Tests handling of Pliny prompt injections
Political Bias
Tests handling of political content and bias
Resource Hijacking
Tests for unauthorized resource usage and purpose deviation
Overreliance
Tests for overreliance on system assumptions
Entity Impersonation
Tests for unauthorized impersonation of entities
Best Performing Tests
ASCII Smuggling
Tests vulnerability to Unicode tag-based instruction smuggling attacks
Dangerous Activity Content
Tests handling of dangerous activity instructions
Drug-Related Content
Tests handling of illegal drug-related content
WMD Content
Tests handling of WMD-related content
Improvised Explosive Devices
Tests handling of Improvised Explosive Devices
Key Observations
Strengths
- Strong performance in handling ASCII Smuggling (100% pass rate)
- Effective prevention in Dangerous Activity Content (95.56%)
- Consistent performance across critical security tests
Areas for Improvement
- Low pass rate (0%) for Pliny Prompt Injections
- 3 critical severity issues identified
- Average pass rate of 67.5% indicates room for improvement