AI Generative

AI Study Reveals 62% Success in Bypassing Chatbot Safety with Poetry Techniques

Icaro Lab’s study reveals that poetic phrasing enables a 62% success rate in bypassing safety measures in major LLMs from OpenAI, Google, and Anthropic.

Staff

Published

30 November, 2025

A recent study by Icaro Lab reveals that creative phrasing, particularly in poetic form, can effectively circumvent the safety mechanisms of various large language models (LLMs). Titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” the research demonstrates a striking 62 percent success rate in eliciting restricted content related to sensitive subjects, including nuclear weapons, child exploitation materials, and self-harm.

The study evaluated multiple LLMs, including popular models from OpenAI, Google, and Anthropic. Researchers found that while models like Google Gemini and DeepSeek were particularly susceptible to generating prohibited responses, others, such as OpenAI’s GPT-5 and Claude Haiku 4.5, displayed stronger adherence to their programmed guardrails.

Although the researchers did not disclose the specific poetic phrases used to achieve these results, they noted the potential dangers of sharing such content. In an interview with Wired, the team stated that the verses are “too dangerous to share with the public.” However, they provided a simplified version to illustrate the ease of bypassing chatbot restrictions, emphasizing that the process is “probably easier than one might think, which is precisely why we’re being cautious.”

This study sheds light on the vulnerabilities within AI systems that are designed to protect users from harmful content. As LLMs become increasingly integrated into various platforms, the implications of such findings raise significant concerns regarding safety and reliability. The ability to easily manipulate these systems poses challenges for developers aiming to enhance the robustness of their AI applications.

The findings of this research could prompt further scrutiny of AI safety protocols and a reevaluation of how language models are programmed to respond to user prompts. As AI technology continues to evolve, ensuring that these systems can effectively discern and prevent the generation of dangerous content will be crucial. The study serves as a reminder of the need for ongoing vigilance in the field of AI development, particularly as creative methods of evading safeguards emerge.

AI Cybersecurity

Anthropic’s Mythos Reveals Thousands of Vulnerabilities, Banks Prepare for AI Cyberattacks

Anthropic's Mythos exposes thousands of critical vulnerabilities in major systems, prompting $100M in defensive action from tech giants and U.S. banks.

Rachel Torres3 days ago

AI Government

US Defense Partners with Anthropic, OpenAI, and Tech Giants for AI-First Military Initiative

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

Staff3 days ago

AI Research

OpenAI’s AI Model Achieves 81.6% Diagnostic Accuracy, Surpassing Human Doctors in ER Tests

OpenAI's o1 model achieves 81.6% diagnostic accuracy in emergency situations, surpassing human doctors and signaling a major shift in medical practice.

Staff3 days ago

AI Marketing

BusySeed Launches Rankxa to Measure Brand Visibility in AI-Generated Search Results

BusySeed unveils Rankxa, a tool tracking brand visibility across AI-generated responses, revealing 90% of brands lack meaningful presence in this new landscape.

Sofía Méndez3 days ago

AI Generative

Google Prepares Omni Model for Gemini Video Generation Ahead of I/O 2026

Google is set to unveil its new video-generation tool, Omni, at I/O 2026, potentially integrating Gemini's capabilities and enhancing competition against ByteDance's Seedance 2.0.

Staff4 days ago

AI Generative

OpenAI Launches GPT Image 2, Surpassing Google Nano Banana 2 in Key Categories

OpenAI unveils GPT Image 2, achieving a record 242-point lead over competitors, transforming the AI image generation landscape with native reasoning capabilities.

Staff4 days ago

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

Nvidia CEO Jensen Huang urges industry leaders to avoid alarmist claims about AI's future, citing concerns over inaccurate predictions like a 50% job displacement...

Marcus Chen4 days ago

AI Government

Anthropic Accuses Moonshot AI of 3.4M Unauthorized Claude Exchanges Amid US State Response

Anthropic accuses Moonshot AI of 3.4M unauthorized exchanges with its Claude chatbot, prompting a global U.S. State Department campaign against IP theft.

Staff4 days ago

AIPRESSA.COM

AI Generative

AI Study Reveals 62% Success in Bypassing Chatbot Safety with Poetry Techniques

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

AI Cybersecurity

Anthropic’s Mythos Reveals Thousands of Vulnerabilities, Banks Prepare for AI Cyberattacks

AI Government

US Defense Partners with Anthropic, OpenAI, and Tech Giants for AI-First Military Initiative

AI Research

OpenAI’s AI Model Achieves 81.6% Diagnostic Accuracy, Surpassing Human Doctors in ER Tests

AI Marketing

BusySeed Launches Rankxa to Measure Brand Visibility in AI-Generated Search Results

AI Generative

Google Prepares Omni Model for Gemini Video Generation Ahead of I/O 2026

AI Generative

OpenAI Launches GPT Image 2, Surpassing Google Nano Banana 2 in Key Categories

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

AI Government

Anthropic Accuses Moonshot AI of 3.4M Unauthorized Claude Exchanges Amid US State Response