Anthropic’s Claude Mythos Preview: A Double-Edged Sword in Cybersecurity and the Call for Broader Oversight

Jia LissaOctober 9, 2025

0 6 7 minutes read

Last week, Anthropic unveiled Claude Mythos Preview, an artificial intelligence model exhibiting such advanced capabilities in identifying and exploiting software vulnerabilities that the company deemed it too perilous for widespread public release. Instead, access to this potent AI has been meticulously curated and restricted to approximately 50 select organizations, including industry giants like Microsoft, Apple, Amazon Web Services, and cybersecurity leader CrowdStrike, alongside other crucial vendors of critical infrastructure, all operating under the umbrella of an initiative dubbed Project Glasswing. This strategic decision by Anthropic has ignited a fervent debate within the cybersecurity and AI communities regarding the responsible development and deployment of increasingly powerful AI systems.

Unprecedented Capabilities and the Ethical Dilemma

The announcement of Claude Mythos Preview was accompanied by a series of striking anecdotes that underscore its formidable prowess. Reports detailed the discovery of thousands of vulnerabilities across virtually every major operating system and web browser. Among these alarming finds were a 27-year-old flaw in OpenBSD and a 16-year-old vulnerability within FFmpeg, highlighting Mythos’s capacity to unearth deeply entrenched and long-overlooked weaknesses. Perhaps most tellingly, Mythos demonstrated the ability to weaponize a set of vulnerabilities it identified in the Firefox browser into 181 distinct, usable attacks—a dramatic leap from Anthropic’s previous flagship model, which could only achieve two such exploits from similar findings. This stark comparison vividly illustrates the exponential increase in the model’s offensive capabilities.

The core dilemma presented by Mythos lies in its dual nature: it represents a significant advancement in defensive cybersecurity, offering an unprecedented tool for proactively identifying and patching critical bugs, yet simultaneously embodies a potent offensive weapon. From one perspective, Anthropic’s decision to restrict access can be seen as a commendable act of responsible disclosure, a practice long advocated by security researchers who urge companies to report vulnerabilities before malicious actors exploit them. By engaging key infrastructure providers directly, Anthropic aims to facilitate swift patching and bolster global digital defenses. However, this private, unilateral decision by a single corporation to control access to such a powerful tool raises profound questions about transparency, equity, and the broader societal implications.

The Genesis of Project Glasswing

Anthropic, a company founded by former OpenAI researchers, has distinguished itself with a strong emphasis on AI safety and "constitutional AI," aiming to build systems that are helpful, harmless, and honest. Project Glasswing, therefore, is presented as a manifestation of these core principles. The initiative’s goal is to leverage Mythos’s capabilities to identify and mitigate vulnerabilities within critical software infrastructure before they can be exploited by adversaries. The chosen 50 organizations are strategic partners, representing key pillars of the global digital ecosystem—from cloud providers to operating system developers and cybersecurity firms—whose immediate access to Mythos’s insights could theoretically yield the most widespread defensive benefits. This approach is designed to create a "patch first" environment, allowing major vendors to address critical bugs in their systems proactively.

However, the very nature of this restricted access, while seemingly pragmatic for immediate threat mitigation, inherently limits the diversity of expertise and scrutiny applied to the model’s findings. The cybersecurity landscape is vast and complex, encompassing a myriad of niche systems, legacy technologies, and specialized domains that may fall outside the purview or immediate priorities of even the most prominent tech companies.

The Transparency Gap and the Challenge of Evaluation

Despite the dramatic demonstrations of Mythos’s capabilities, the public and the broader cybersecurity community have been provided with remarkably limited data to independently evaluate Anthropic’s decision and the model’s true performance. The showcased successes, while impressive, form a "highlight reel" that lacks the comprehensive context needed for a full assessment.

A critical piece of missing information pertains to the model’s rate of false positives. Anthropic reported that security contractors agreed with the AI’s severity rating 198 times, with an 89% severity agreement. While this figure appears impressive, it is incomplete. Independent research into similar AI models has revealed a common challenge: AI systems designed to detect nearly every genuine bug also tend to "hallucinate" plausible-sounding vulnerabilities in code that is, in fact, patched and correct. This phenomenon of generating false alarms is not merely an inconvenience; it can significantly dilute the value of an AI’s output. A model that generates thousands of false alarms, even if it finds hundreds of real vulnerabilities, still requires extensive human review and validation, demanding significant skilled labor and resources. Without knowing the rate of false alarms in Mythos’s unfiltered output, it remains difficult to ascertain whether the showcased examples are truly representative of its overall performance or carefully selected successes.

This distinction is vital. An AI capable of autonomously identifying and exploiting hundreds of vulnerabilities with near-inhuman precision would indeed be a transformative game-changer. However, a model that also generates an overwhelming volume of non-working attacks or misidentified flaws, while still useful, fundamentally alters the human-AI partnership dynamic, requiring a different level of human oversight and expertise. The lack of comprehensive, independently verifiable metrics prevents a clear understanding of where Mythos falls on this spectrum.

The Asymmetry of Access and Expertise

A second, more subtle, yet equally profound concern arises from the inherent biases in large language models (LLMs), including Mythos. These models typically perform best on inputs that closely resemble the data they were trained on—predominantly widely used open-source projects, major web browsers, the Linux kernel, and popular web frameworks. This makes the concentration of early access among the largest vendors of precisely this kind of software a logical, even sensible, strategy for immediate defensive gains. It enables these entities to patch vulnerabilities in their most widely used products before malicious actors can exploit them.

However, the inverse of this principle is equally true and far more troubling. Software ecosystems that fall outside this training distribution—such as industrial control systems (ICS), medical device firmware, bespoke financial infrastructure, regional banking software, or older embedded systems—are precisely the areas where an out-of-the-box Mythos is likely to be least effective in autonomously finding or exploiting bugs. These specialized domains often rely on proprietary code, obscure programming languages, unique architectures, and limited publicly available documentation, making them challenging for a general-purpose AI trained on common data.

The danger here is not that Mythos fails in these specialized domains, but rather that it could succeed for an attacker who possesses the requisite domain expertise. A sufficiently motivated and knowledgeable attacker could wield Mythos’s advanced reasoning capabilities as a force multiplier, guiding the AI to probe systems that Anthropic’s own engineers might lack the specialized knowledge to audit effectively. In such scenarios, the AI becomes a powerful tool in the hands of an expert, potentially uncovering vulnerabilities in critical infrastructure that are both obscure and highly impactful. This creates a dangerous asymmetry where critical systems, often underpinning national security and public safety, remain vulnerable due to a lack of shared access to advanced defensive tools and specialized human expertise.

Calls for Broader Governance and Community Involvement

The current situation, where a private company makes unilateral decisions about which pieces of critical global infrastructure receive priority defense, raises serious questions. Anthropic, despite its good intentions and commitment to safety, operates with finite staff, budget, and expertise. It is inevitable that certain vulnerabilities or entire classes of systems will be overlooked, and when these missed elements pertain to software controlling hospitals, power grids, or transportation networks, the societal cost could be immense, borne by individuals who had no say in the decision-making process.

Moreover, the problem extends beyond Anthropic and Mythos. OpenAI, another leading AI developer, has similarly announced a staggered rollout of its new GPT-5.4-Cyber model due to perceived cybersecurity risks, suggesting that Mythos Preview is not unique but rather indicative of a broader trend in advanced AI development. This highlights a systemic challenge facing the AI industry and global society. Interestingly, some security companies, such as Aisle, have even claimed to replicate many of Anthropic’s published anecdotes using smaller, cheaper, and publicly available AI models, further underscoring the need for independent verification and challenging the notion of exclusive, unreplicable power in these advanced models.

The implications of these powerful models extend to national security, personal safety, and corporate competitiveness. Any technology capable of identifying thousands of exploitable flaws in the foundational systems we all depend on should not be governed solely by the internal judgment of its creators, no matter how well-intentioned they may be.

This situation underscores the urgent need for a more globally coordinated and transparent framework for the development and deployment of such potent AI. While regulation is likely an inevitable outcome, and a necessary one, it will be a complex and lengthy process requiring extensive consultation and feedback. In the short term, immediate steps are required:

Greater Transparency and Information Sharing: This does not necessarily mean making powerful models like Claude Mythos widely available. Instead, it calls for sharing as much data and information as possible regarding the models’ capabilities, limitations (including false positive rates), and methodologies. This would enable the broader community to collectively make informed decisions and contribute to defense efforts.
Globally Coordinated Frameworks for Independent Auditing: External, impartial auditing bodies are crucial for verifying claims, assessing risks, and ensuring accountability.
Mandatory Disclosure of Aggregate Performance Metrics: Standardized and publicly accessible metrics would allow for objective comparison and evaluation across different AI models and developers.
Funded Access for Academic and Civil-Society Researchers: Providing structured access to academic researchers and domain specialists—such as cardiologists partnering on medical device security, control-systems engineers, or researchers focusing on less prominent languages and ecosystems—would meaningfully reduce the current asymmetry. Fifty companies, however well-chosen, cannot substitute for the distributed expertise of the entire global research community. These groups can bring specialized knowledge to bear on critical, often overlooked, sectors.

Until such changes are implemented, each release of a "Mythos-class" AI model will continue to place the world at the edge of a new precipice. Without greater visibility into these powerful technologies and a more inclusive governance model, society remains largely blind to the potential risks and the efficacy of the proposed solutions. The decision of how to manage technologies with such profound societal impact—technologies that can determine the security of our most critical infrastructure—is too significant to be left solely to the internal discretion of for-profit corporations in a democratic society. Society must be empowered to make informed choices about its own security, rather than having those choices implicitly restricted by the developers of these advanced tools.

This essay was written with David Lie, and originally appeared in The Globe and Mail.