Models & Research

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

AI Quick Briefs Editorial Desk · May 16, 2026

What happened

Carnegie Mellon researchers created a new benchmark testing how well AI agents can autonomously develop real exploits targeting Google’s V8 JavaScript engine. Two top models, Claude Mythos and GPT-5.5, took part. Mythos pulled ahead by a large margin in exploit success, but at a cost twelve times higher than GPT-5.5’s operational expense.

Why it matters

This benchmark exposes how capable AI has become at finding and exploiting actual browser vulnerabilities without human aid. In practice, that lowers the bar for automated cyberattacks on critical software infrastructure. It pressures security teams to prepare for AI-driven threat actors efficiently crafting attack code without direct human expertise. At the same time, the cost difference signals a trade-off between power and operational expense in deploying these models.

For defenders, it raises the urgency of automated detection methods that can keep pace with AI-generated exploits. For the attackers willing to invest in more expensive AI, exploit development speed and effectiveness improve, which shifts the threat landscape. This also signals how AI models are moving beyond language and automation tasks into active security offense, shifting security calculus for browser vendors and enterprises using them.

What to watch next

Expect further refinements in benchmarks measuring AI-driven exploit development targeting complex, real-world software. Watch both Claude Mythos and GPT-5.5 updates for improvements in efficiency and success rates. Security defenders and incident responders should track how AI tools evolve to craft sophisticated exploits autonomously, as this will tighten timelines for patching vulnerabilities and increase pressure on automated defenses. Investors and builders in cybersecurity AI will want to see how cost-effectiveness evolves as AI models specializing in offensive capabilities grow sharper and potentially cheaper.

AI Quick Briefs Editorial Desk

Read Full Article →