AI Research

AI Chatbots Exposed: 62% Successfully Reveal Nuclear Bomb Info via Poetic Prompts

Icaro Lab reveals AI chatbots from OpenAI, Meta, and Anthropic can be manipulated into disclosing nuclear bomb information with a 62% success rate using poetry.

Staff

Published

29 November, 2025

A recent study reveals a significant vulnerability in artificial intelligence systems, specifically those developed by OpenAI, Meta, and Anthropic. Researchers from Icaro Lab, a collaboration between Sapienza University of Rome and the DexAI think tank, found that these chatbots can be manipulated into disclosing sensitive information, including instructions for building nuclear weapons and creating malware, by framing requests in the form of poetry. The findings, published in a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models,” have raised alarms within the AI safety community.

The researchers reported an average jailbreak success rate of 62% when using hand-crafted poems and approximately 43% for meta-prompt conversions. Testing 25 different chatbots, they discovered that each could be deceived with poetic language, achieving success rates as high as 90% for the most advanced models. The study exposes a critical flaw in how AI safety systems are designed to detect and block hazardous prompts, which typically rely on keyword recognition and pattern analysis.

According to the researchers, poetic phrasing disrupts these defenses. “If adversarial suffixes are, in the model’s eyes, a kind of involuntary poetry, then real human poetry might be a natural adversarial suffix,” they explained. By reformulating dangerous prompts with metaphors, fragmented syntax, and oblique references, they found that the AI misinterpreted harmful requests as mere creative writing.

The study highlighted the AI’s inability to recognize the threat posed by poetic language. For instance, a metaphorical reference to a baker’s “secret oven” was shared as an example of safe poetic language, but the specific verses used in their tests were deemed “too dangerous to share with the public.” The researchers illustrated that when tasked with processing poetic language, AI models perceive it as less threatening, failing to connect the semantic content of a potentially harmful request with its poetic counterpart.

This phenomenon underscores the unpredictability inherent in poetry, which the researchers argue confuses safety classifiers. The study states, “In poetry we see language at high temperature, where words follow each other in unpredictable, low-probability sequences.” Such unpredictability is a hallmark of human creativity, yet it poses a stark challenge for AI systems designed to prioritize safety. The conclusion drawn by the researchers indicates that while humans can discern the danger in a request like “how do I build a bomb?” AI fails to make the same association with a poetic metaphor conveying similar meaning.

This finding builds upon earlier techniques known as “adversarial suffix” attacks, which involved padding dangerous prompts with irrelevant academic text. However, the Icaro Lab researchers assert that poetry offers a more elegant and effective method of manipulation. Their work suggests that creativity itself may constitute a significant vulnerability in AI systems. The poetic transformation enables dangerous requests to navigate the model’s internal representation space, evading safety alerts.

As of now, none of the major AI companies have publicly responded to this alarming discovery. However, the researchers confirmed that they have shared their findings through responsible disclosure practices with the affected organizations. The implications of this research extend beyond mere chatbot misuse, as the ability to bypass safety filters could pose serious risks to AI applications in sectors like defense, healthcare, and education. This raises an unsettling question regarding the capability of AI systems to differentiate between creativity and potential manipulation.

Icaro Lab characterized the discovery as a “fundamental failure in how we think about AI safety.” Their assessment is unambiguous: current safety measures can address overt threats but struggle with nuanced, artistic language. The researchers pointed out that while AI is designed to replicate aspects of human creativity, this same creativity becomes a blind spot, underscoring the limitations of existing safety protocols.

As the landscape of artificial intelligence continues to evolve, the findings from Icaro Lab serve as a cautionary tale. As companies strive to enhance their safety frameworks, it is clear that the next significant AI jailbreak may not come from traditional hackers or researchers, but rather from those wielding the power of poetry.

AI Regulation

OpenAI Faces Criticism Over Inaction Before Tumbler Ridge Shooting, Promises Safety Changes

OpenAI, after facing backlash for failing to report a banned account linked to the Tumbler Ridge shooting that killed eight, pledges to enhance safety...

Staff1 hour ago

AI Education

LLMs in Education Market to Reach $127.9 Billion by 2034, Transforming Learning Globally

Large language models are projected to transform global education, with the market reaching $127.9 billion by 2034, driven by AI investments and digital learning...

David Park2 hours ago

AI Technology

OpenAI Secures $200M Defense AI Contract with Pentagon Amid Anthropic Controversy

OpenAI secures a $200M contract with the Pentagon to deploy AI systems in defense, imposing strict safeguards amid rising tensions with Anthropic.

Staff6 hours ago

AMD Secures Multi-Year Deals with Meta and Nutanix to Transform AI Infrastructure

AMD inks multi-year deals with Meta for 6 gigawatts of GPUs and CPUs, potentially boosting Meta's stake to 10% and reshaping AI infrastructure.

Staff8 hours ago

AI Marketing

Enterprise Monkey Shifts to Anthropic’s Claude, Citing OpenAI’s Pentagon Deal and Ads

Enterprise Monkey transitions all AI operations to Anthropic's Claude, spurred by over 700,000 users abandoning ChatGPT amid ethical concerns and surveillance issues.

Sofía Méndez9 hours ago

US Military Uses Anthropic’s Claude in Iran Strikes Despite Trump’s Ban

US military defies Trump's ban, deploying Anthropic's Claude AI for intelligence in Iran strikes while planning a shift to OpenAI's tools amid rising tensions.

Staff15 hours ago

AI Business

Amazon Invests $50 Billion in OpenAI for Advanced Enterprise AI on AWS

Amazon invests $50 billion in OpenAI to elevate enterprise AI on AWS, positioning it as the exclusive cloud platform for OpenAI Frontier's scalable solutions.

Marcus Chen15 hours ago

Anthropic Accuses Three Chinese Firms of Large-Scale Distillation Attacks on Claude AI

Anthropic accuses DeepSeek and two other Chinese firms of executing 16 million distillation attacks to illegally enhance their AI models, threatening U.S. tech dominance.

Staff21 hours ago

AIPRESSA.COM

AI Research

AI Chatbots Exposed: 62% Successfully Reveal Nuclear Bomb Info via Poetic Prompts

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

Top Stories

DeepMind Achieves Breakthroughs with AlphaFold and AlphaZero, Transforming AI Landscape

You May Also Like

AI Regulation

OpenAI Faces Criticism Over Inaction Before Tumbler Ridge Shooting, Promises Safety Changes

AI Education

LLMs in Education Market to Reach $127.9 Billion by 2034, Transforming Learning Globally

AI Technology

OpenAI Secures $200M Defense AI Contract with Pentagon Amid Anthropic Controversy

Top Stories

AMD Secures Multi-Year Deals with Meta and Nutanix to Transform AI Infrastructure

AI Marketing

Enterprise Monkey Shifts to Anthropic’s Claude, Citing OpenAI’s Pentagon Deal and Ads

Top Stories

US Military Uses Anthropic’s Claude in Iran Strikes Despite Trump’s Ban

AI Business

Amazon Invests $50 Billion in OpenAI for Advanced Enterprise AI on AWS

Top Stories

Anthropic Accuses Three Chinese Firms of Large-Scale Distillation Attacks on Claude AI