Mozilla's AI Vulnerability Detection: Mythos Finds 271 Firefox Flaws with Minimal False Positives

When Mozilla's CTO declared that AI-assisted vulnerability detection meant “zero-days are numbered” and “defenders finally have a chance to win, decisively,” skepticism was widespread. Critics pointed to a familiar pattern of cherry-picked results and omitted fine print fueling AI hype. To address these doubts, Mozilla recently offered an inside look at its use of Anthropic Mythos—a specialized AI model for spotting software vulnerabilities. Over two months, Mythos uncovered 271 flaws in Firefox with what engineers describe as “almost no false positives.” This breakthrough, detailed in a Mozilla engineering post, stems from two key factors: improvements in AI models themselves and a custom “harness” that integrates Mythos with Firefox's source code analysis pipeline. The result is a tangible step toward making AI a reliable ally in cybersecurity, moving beyond past problems with hallucinated bug reports.

What Is Mythos and How Did Mozilla Use It to Find Vulnerabilities?

Mythos is an AI model developed by Anthropic, specifically designed to identify security vulnerabilities in source code. Mozilla integrated Mythos into its Firefox development workflow over a two-month period. The AI analyzed large portions of Firefox's C++ codebase, looking for patterns that might indicate bugs like memory corruption, use-after-free, or other common security issues. Unlike generic large language models, Mythos has been fine-tuned for code analysis, reducing the rate of irrelevant or inaccurate findings. Mozilla’s engineers report that Mythos flagged 271 distinct vulnerabilities during the trial, covering a range of severity levels. The AI's output was then reviewed by human developers, who verified the reports as genuine flaws. This process allowed Mozilla to prioritize fixes more efficiently, demonstrating that AI can complement—rather than replace—human expertise in security auditing.

Mozilla's AI Vulnerability Detection: Mythos Finds 271 Firefox Flaws with Minimal False Positives — Source: feeds.arstechnica.com

How Did Mozilla Achieve Such a Low False Positive Rate?

The key to minimizing false positives lies in two advancements. First, the Mythos model itself has improved significantly, learning from a vast dataset of real-world vulnerabilities and secure coding practices. Second, Mozilla developed a custom “harness”—a software framework that feeds Mythos the right context from Firefox's source code. The harness filters out irrelevant code sections, structures the analysis input, and post-processes the model's outputs to ensure they correspond to actual code paths. This setup contrasts sharply with earlier attempts where a generic prompt would generate plausible-sounding but often hallucinated bug reports. By engineering the entire pipeline, from data input to result validation, Mozilla reduced the “unwanted slop” that plagued prior AI-assisted vulnerability detection. The result is that developers spend far less time investigating false alarms and can focus on real fixes.

What Was the Main Problem with Earlier AI-Based Vulnerability Detection?

Earlier efforts to use AI for vulnerability detection were plagued by a high rate of false positives and hallucinated details. Typically, a developer would ask a model to analyze a block of code, and the AI would produce lengthy reports that sounded plausible but contained fabricated facts—like nonexistent functions or impossible execution paths. Human reviewers then had to invest substantial time verifying each claim, often discovering that the AI had invented vulnerabilities that didn't exist. This wasted resources and eroded trust in AI tools. Mozilla's previous experience with such tools led them to describe the output as “unwanted slop.” The company realized that without careful orchestration of both model capabilities and the surrounding analysis pipeline, AI vulnerability detection would remain impractical for real-world use.

How Many Vulnerabilities Did Mythos Detect and Over What Period?

Over a two-month evaluation period, Mythos successfully identified 271 distinct security vulnerabilities in the Firefox codebase. These ranged from low-severity issues to critical flaws that could potentially be exploited. The detection rate was comparable to or better than traditional manual audits, but the key advantage was the speed—Mythos could scan large portions of code in a fraction of the time. Mozilla engineers emphasized that almost none of these 271 reports were false positives; nearly every output corresponded to a real vulnerability that required a patch. This high accuracy represents a significant milestone for AI-assisted security, moving the technology past the hype cycle and into practical deployment.

Why Was There Skepticism About Mozilla's CTO Statement on AI and Zero-Days?

Skepticism arose because earlier claims about AI in cybersecurity often followed a pattern: highlight a few impressive results, omit the caveats, and let the hype build. Many experts remembered past instances where AI tools generated flashy demos but failed in production due to high false-positive rates or lack of scalability. When Mozilla's CTO said “zero-days are numbered” and “defenders finally have a chance to win,” critics argued that such statements were premature and based on cherry-picked data. They wanted to see transparent, large-scale testing results. Mozilla’s subsequent behind-the-scenes disclosure—complete with details on the 271 vulnerabilities, the custom harness, and the near-zero false-positive rate—was designed to address these doubts head-on, providing concrete evidence that the technology has matured.

What Role Did the Custom Harness Play in Mythos's Success?

The custom harness was crucial in bridging the gap between Mythos's general capabilities and Firefox's specific codebase. It acts as a middleware layer that preprocesses source code into a format optimized for the AI model, ensures the model receives proper context (such as function calls and dependencies), and then validates the outputs against actual code segments. Without the harness, Mythos might have generated many more false positives or missed subtle vulnerabilities because it couldn't understand the broader software architecture. Mozilla engineers spent months developing and fine-tuning this harness, and it represents a reusable component that could be adapted for other large projects. The harness also logs all interactions for auditing, allowing developers to trace exactly how a vulnerability was flagged—a critical feature for building trust in AI-driven security tools.

How Does Mythos Compare to Traditional Human-Led Vulnerability Hunts?

Mythos complements human-led vulnerability hunting rather than replacing it. While a skilled human auditor might find more nuanced logic bugs, Mythos excels at rapidly scanning massive codebases for known vulnerability patterns, such as memory leaks, buffer overflows, or use-after-free errors. In the two-month trial, Mythos found 271 real flaws—a volume that would likely take a team of human experts many more months to uncover. However, human judgment remains essential for triaging, understanding the business impact, and designing fixes. Mozilla envisions a workflow where AI does the initial sweep, humans validate and prioritize, and then the harness learns from feedback to improve future scans. This symbiotic relationship increases overall security coverage and reduces the time window during which vulnerabilities can be exploited.