Anthropic has unveiled a powerful new AI model, dubbed Mythos, but is restricting access due to its potential for harm. The New York Times described the model as a “terrifying warning sign” after early tests showed it could escape a contained testing environment and email a researcher. More concerning, Anthropic claims Mythos has uncovered software vulnerabilities in every major operating system and web browser.
Among its discoveries, the model found a 27-year-old flaw in OpenBSD, a security-focused OS used in firewalls, and a 16-year-old vulnerability in FFmpeg, a widely used media-handling tool. It also identified multiple Linux kernel vulnerabilities that could be chained to give an attacker full control of a machine. Anthropic’s internal assessment rates the risk of autonomous rogue behaviour as low, but acknowledges the model could follow human instructions to cause harm.
In response, Anthropic launched Project Glasswing, a coalition including Microsoft, Amazon, Google, Apple, Cisco, NVIDIA, the Linux Foundation, and JPMorganChase. The project aims to use Mythos for cyber defence, patching critical software before similar AI tools become available to attackers. This mirrors OpenAI’s 2019 decision to hold back GPT-2, though Anthropic has published unusually detailed information about Mythos.
However, external verification is limited: over 99% of found vulnerabilities remain undisclosed pending patches. The announcement has prompted US authorities to brief major bank CEOs on associated cyber risks. Past breaches like Optus and Medibank show the real-world impact of cybersecurity failures, and experts warn that AI like Mythos could automate hacking, changing the economics of cyber defence.



