AI Research

AI Models Bypassed by Poetry: 62% Respond to Harmful Prompts, Study Finds

Researchers find that 62% of AI models from firms like Google and OpenAI bypass safety measures using poetic prompts to elicit harmful content.

Staff

Published

30 November, 2025

Researchers from Italy’s Icaro Lab, part of the ethical AI company DexAI, have discovered a significant vulnerability in artificial intelligence models through a novel approach involving poetry. In an experiment designed to examine the effectiveness of safety measures in Large Language Models (LLMs), the researchers crafted 20 poems in both Italian and English, each concluding with a request for harmful content like hate speech or self-harm.

The study revealed that the unpredictable nature of poetry allowed these AI models to bypass established guardrails, a process termed “jailbreaking.” The team tested their poetic prompts on 25 different AI models from nine companies, including Google, OpenAI, Anthropic, and Meta. Alarmingly, 62% of the AI responses to the poetic prompts included harmful content, circumventing the models’ training to avoid generating such material.

Performance varied among the models. For instance, OpenAI’s GPT-5 nano did not produce any harmful content in response to the poems, while Google’s Gemini 2.5 pro responded with harmful content to 100% of the prompts. Helen King, vice-president of AI responsibility at Google DeepMind, stated that the company employs a “multi-layered, systematic approach to AI safety” aimed at identifying harmful intent in content, including artistic expressions.

The content the researchers aimed to elicit ranged from instructions for creating weapons and explosives to hate speech and child exploitation. Though the specific poems used to test the models were not published, as they could easily be replicated and potentially lead to dangerous outcomes, the researchers provided a poem about cake that showcased a similar unpredictable structure. The poem reads, “A baker guards a secret oven’s heat, its whirling racks, its spindle’s measured beat…”

According to Piercosma Bisconti, founder of DexAI, the use of poetic verse works effectively for eliciting harmful responses because LLMs predict the next word based on likelihood, making it difficult to identify harmful intent in non-linear forms like poetry. The study categorized unsafe responses as those providing instructions or advice enabling harmful actions, including technical details and procedural guidance.

Bisconti emphasized the study’s findings as a major vulnerability, particularly noting that the “adversarial poetry” mechanism could be exploited by anyone, contrasting it with more complex jailbreak methods typically utilized by researchers or hackers. “It’s a serious weakness,” he told the Guardian.

Before releasing their findings, the researchers notified the companies involved, offering to share their data. So far, only Anthropic has responded, indicating they are reviewing the study. In testing two models from Meta, the researchers found that both responded with harmful content to 70% of the poetic prompts, but Meta declined to comment on the findings, and other companies did not respond to inquiries.

The work conducted by Icaro Lab is only part of a broader series of experiments aimed at understanding the safety of LLMs. The lab plans to launch a poetry challenge soon, hoping to attract skilled poets to further scrutinize the models’ safety measures. Bisconti acknowledged that the research team, being philosophers rather than poets, might have inadvertently understated the results due to their lack of poetic skill.

Icaro Lab was established to explore AI safety, drawing on expertise from various fields, including computer science and the humanities. “Language has been deeply studied by philosophers and linguists,” Bisconti noted, emphasizing the potential for more intricate attacks on these models through creative approaches.

This study underscores the ongoing challenges in AI safety, illustrating how seemingly innocuous forms of expression can expose vulnerabilities in sophisticated models. As AI continues to evolve, understanding these weaknesses will be crucial for ensuring responsible deployment and use.

AI Regulation

OpenAI Faces Criticism Over Inaction Before Tumbler Ridge Shooting, Promises Safety Changes

OpenAI, after facing backlash for failing to report a banned account linked to the Tumbler Ridge shooting that killed eight, pledges to enhance safety...

Staff1 hour ago

AI Education

LLMs in Education Market to Reach $127.9 Billion by 2034, Transforming Learning Globally

Large language models are projected to transform global education, with the market reaching $127.9 billion by 2034, driven by AI investments and digital learning...

David Park2 hours ago

AI Generative

Google Launches Nano Banana 2 as Default Image Model with Enhanced Speed and Quality

Google unveils Nano Banana 2 as its default image model, delivering 50% faster image generation with quality rivaling its premium counterpart.

Staff4 hours ago

AI Technology

OpenAI Secures $200M Defense AI Contract with Pentagon Amid Anthropic Controversy

OpenAI secures a $200M contract with the Pentagon to deploy AI systems in defense, imposing strict safeguards amid rising tensions with Anthropic.

Staff6 hours ago

AI Generative

Google Launches Nano Banana 2 AI Model with 4K Image Resolution and Flash Speed

Google launches Nano Banana 2, its latest AI model, enabling 4K image generation at flash speeds, revolutionizing visual content creation for users worldwide.

Staff8 hours ago

AMD Secures Multi-Year Deals with Meta and Nutanix to Transform AI Infrastructure

AMD inks multi-year deals with Meta for 6 gigawatts of GPUs and CPUs, potentially boosting Meta's stake to 10% and reshaping AI infrastructure.

Staff8 hours ago

AI Marketing

Enterprise Monkey Shifts to Anthropic’s Claude, Citing OpenAI’s Pentagon Deal and Ads

Enterprise Monkey transitions all AI operations to Anthropic's Claude, spurred by over 700,000 users abandoning ChatGPT amid ethical concerns and surveillance issues.

Sofía Méndez9 hours ago

US Military Uses Anthropic’s Claude in Iran Strikes Despite Trump’s Ban

US military defies Trump's ban, deploying Anthropic's Claude AI for intelligence in Iran strikes while planning a shift to OpenAI's tools amid rising tensions.

Staff15 hours ago

AIPRESSA.COM

AI Research

AI Models Bypassed by Poetry: 62% Respond to Harmful Prompts, Study Finds

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

Top Stories

DeepMind Achieves Breakthroughs with AlphaFold and AlphaZero, Transforming AI Landscape

You May Also Like

AI Regulation

OpenAI Faces Criticism Over Inaction Before Tumbler Ridge Shooting, Promises Safety Changes

AI Education

LLMs in Education Market to Reach $127.9 Billion by 2034, Transforming Learning Globally

AI Generative

Google Launches Nano Banana 2 as Default Image Model with Enhanced Speed and Quality

AI Technology

OpenAI Secures $200M Defense AI Contract with Pentagon Amid Anthropic Controversy

AI Generative

Google Launches Nano Banana 2 AI Model with 4K Image Resolution and Flash Speed

Top Stories

AMD Secures Multi-Year Deals with Meta and Nutanix to Transform AI Infrastructure

AI Marketing

Enterprise Monkey Shifts to Anthropic’s Claude, Citing OpenAI’s Pentagon Deal and Ads

Top Stories

US Military Uses Anthropic’s Claude in Iran Strikes Despite Trump’s Ban