Anthropic has raised alarms regarding a pivotal moment in cybersecurity, noting that advancements in AI models are serving both defensive and offensive capacities in cyber operations. This warning comes on the heels of reports indicating that state-sponsored hackers from China utilized Anthropic’s technology to streamline their intrusions into major corporations and foreign governments during a hacking spree in September.
In a recent research report, Anthropic stated, “As part of our Safeguards work, we have found and disrupted threat actors on our own platform who leveraged AI to scale their operations.” The company detailed a case of “vibe hacking,” where a cybercriminal employed its AI model, Claude, to orchestrate an extensive data extortion scheme that traditionally would have required a full team. The Safeguards team also reported thwarting Claude’s use in complex espionage activities targeting vital telecommunications infrastructure, echoing tactics associated with Chinese APT operations.
Over the past year, a noticeable transition has emerged, as demonstrated in Anthropic’s findings. The company’s AI models were able to simulate one of the most costly cyberattacks in history, the 2017 Equifax breach. Furthermore, Claude has participated in cybersecurity competitions, at times outperforming human teams. This technology has also played a crucial role in identifying vulnerabilities in Anthropic’s own code, allowing those issues to be addressed prior to deployment.
In mid-September, Anthropic detected suspicious activities that led to the identification of an advanced espionage campaign. The attackers exploited AI’s agentic capabilities, employing the technology not merely as a consultant but as a direct executor of attacks.
See also
India Unveils AI Governance Guidelines; Amazon Fights AI Browser PurchasesInvestigations revealed that the threat actor, assessed with high confidence as a Chinese state-sponsored group, manipulated Claude Code to infiltrate approximately thirty global targets, achieving successful breaches in a few instances. Traditional targets included tech firms, financial institutions, chemical manufacturers, and government agencies, marking this as a potentially unprecedented large-scale cyberattack executed with minimal human intervention.
Following the detection, Anthropic promptly initiated an investigation to ascertain the extent and nature of the operation. Over the course of ten days, the team mapped the entire campaign, banned compromised accounts, and coordinated with relevant authorities while amassing actionable intelligence.
The report highlighted a critical observation: “Agents are valuable for everyday work and productivity—but in the wrong hands, they can substantially increase the viability of large-scale cyberattacks.” As such, the company has expanded its detection capabilities and improved classifiers for identifying malicious activities, reiterating their commitment to developing new methods for investigating large-scale cyber threats.
Anthropic’s review indicates that cyber capabilities are doubling every six months, with real-world attacks increasingly leveraging AI to exploit vulnerabilities. Their Threat Intelligence team recently intercepted a threat campaign, emphasizing the need for collaborative industry efforts in threat sharing and enhanced safeguards to counter adversarial misuse of AI technology.
The recent attacks utilized several AI capabilities that were either nonexistent or nascent just a year ago. The advanced general intelligence of these models enables them to follow complex instructions and grasp context, allowing them to perform sophisticated tasks. Specifically, their coding skills make them particularly adept at facilitating cyberattacks.
Moreover, these models can act as autonomous agents, executing tasks in loops with limited human input. They now have access to an array of software tools through the open standard Model Context Protocol, allowing them to execute actions that previously required human intervention, including using password crackers and network scanners.
In the initial phase of the attack, human operators selected targets and crafted an attack framework relying on Claude Code as an automated tool. To bypass Claude’s safeguards against harmful behavior, attackers broke down the operation into smaller, seemingly innocuous tasks, misleading the AI by posing as employees of a legitimate cybersecurity firm.
In subsequent phases, Claude conducted reconnaissance on target systems, swiftly identifying high-value databases and reporting findings to its human operators—accomplishing in minutes what would take human teams significantly longer. Later, Claude generated exploit code and performed security vulnerability tests autonomously, harvesting credentials and extracting vast amounts of sensitive data with minimal human oversight.
Overall, Anthropic noted that the AI executed 80-90% of the campaign independently, requiring human intervention only at critical decision points. The attack, characterized by an unprecedented operational tempo, involved thousands of requests per second, a feat that human hackers could not match.
Despite Claude’s remarkable capabilities, the report acknowledged that the AI does not operate flawlessly and occasionally “hallucinates” information. However, it demonstrated extensive autonomous functionality throughout the operation phases, revealing a fundamental shift in cybersecurity dynamics.
Anthropic has urged security teams to experiment with leveraging AI for defense in areas such as Security Operations Center automation, threat detection, and incident response. “We must not cede the cyber advantage derived from AI to attackers and criminals,” the report emphasized, advocating for enhanced investment in safeguards across AI platforms.
















































