Connect with us

Hi, what are you looking for?

AI Research

AI Chatbots Exposed: 62% Successfully Reveal Nuclear Bomb Info via Poetic Prompts

Icaro Lab reveals AI chatbots from OpenAI, Meta, and Anthropic can be manipulated into disclosing nuclear bomb information with a 62% success rate using poetry.

A recent study reveals a significant vulnerability in artificial intelligence systems, specifically those developed by OpenAI, Meta, and Anthropic. Researchers from Icaro Lab, a collaboration between Sapienza University of Rome and the DexAI think tank, found that these chatbots can be manipulated into disclosing sensitive information, including instructions for building nuclear weapons and creating malware, by framing requests in the form of poetry. The findings, published in a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models,” have raised alarms within the AI safety community.

The researchers reported an average jailbreak success rate of 62% when using hand-crafted poems and approximately 43% for meta-prompt conversions. Testing 25 different chatbots, they discovered that each could be deceived with poetic language, achieving success rates as high as 90% for the most advanced models. The study exposes a critical flaw in how AI safety systems are designed to detect and block hazardous prompts, which typically rely on keyword recognition and pattern analysis.

According to the researchers, poetic phrasing disrupts these defenses. “If adversarial suffixes are, in the model’s eyes, a kind of involuntary poetry, then real human poetry might be a natural adversarial suffix,” they explained. By reformulating dangerous prompts with metaphors, fragmented syntax, and oblique references, they found that the AI misinterpreted harmful requests as mere creative writing.

The study highlighted the AI’s inability to recognize the threat posed by poetic language. For instance, a metaphorical reference to a baker’s “secret oven” was shared as an example of safe poetic language, but the specific verses used in their tests were deemed “too dangerous to share with the public.” The researchers illustrated that when tasked with processing poetic language, AI models perceive it as less threatening, failing to connect the semantic content of a potentially harmful request with its poetic counterpart.

This phenomenon underscores the unpredictability inherent in poetry, which the researchers argue confuses safety classifiers. The study states, “In poetry we see language at high temperature, where words follow each other in unpredictable, low-probability sequences.” Such unpredictability is a hallmark of human creativity, yet it poses a stark challenge for AI systems designed to prioritize safety. The conclusion drawn by the researchers indicates that while humans can discern the danger in a request like “how do I build a bomb?” AI fails to make the same association with a poetic metaphor conveying similar meaning.

This finding builds upon earlier techniques known as “adversarial suffix” attacks, which involved padding dangerous prompts with irrelevant academic text. However, the Icaro Lab researchers assert that poetry offers a more elegant and effective method of manipulation. Their work suggests that creativity itself may constitute a significant vulnerability in AI systems. The poetic transformation enables dangerous requests to navigate the model’s internal representation space, evading safety alerts.

As of now, none of the major AI companies have publicly responded to this alarming discovery. However, the researchers confirmed that they have shared their findings through responsible disclosure practices with the affected organizations. The implications of this research extend beyond mere chatbot misuse, as the ability to bypass safety filters could pose serious risks to AI applications in sectors like defense, healthcare, and education. This raises an unsettling question regarding the capability of AI systems to differentiate between creativity and potential manipulation.

Icaro Lab characterized the discovery as a “fundamental failure in how we think about AI safety.” Their assessment is unambiguous: current safety measures can address overt threats but struggle with nuanced, artistic language. The researchers pointed out that while AI is designed to replicate aspects of human creativity, this same creativity becomes a blind spot, underscoring the limitations of existing safety protocols.

As the landscape of artificial intelligence continues to evolve, the findings from Icaro Lab serve as a cautionary tale. As companies strive to enhance their safety frameworks, it is clear that the next significant AI jailbreak may not come from traditional hackers or researchers, but rather from those wielding the power of poetry.

Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Generative

Icaro Lab's study reveals that poetic phrasing enables a 62% success rate in bypassing safety measures in major LLMs from OpenAI, Google, and Anthropic.

Top Stories

AI-driven adult content is set to surge to $2.5B this year, with OpenAI and xAI leading the charge in revolutionizing the porn industry.

AI Research

Researchers find that 62% of AI models from firms like Google and OpenAI bypass safety measures using poetic prompts to elicit harmful content.

AI Technology

Amazon, Meta, and other tech giants are set to raise nearly $100 billion in debt to fuel AI and cloud infrastructure, reflecting a critical...

AI Research

High school dropout Gabriel Petersson lands a research scientist role at OpenAI, mastering machine learning through ChatGPT's innovative guidance.

AI Generative

Google limits its Nano Banana Pro to two images daily while OpenAI restricts Sora video generations to six, signaling a shift towards monetization strategies.

Top Stories

Moonshot AI's Kimi K2 Thinking outperforms OpenAI's GPT-5 and Anthropic's Claude Sonnet 4.5, signaling China's rise in global AI competitiveness.

Top Stories

Over 1,000 Amazon employees warn the company’s $150 billion AI push threatens jobs, democracy, and the environment, calling for urgent ethical reforms.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.