AI Research

AI Struggles with Humor: Research Reveals LLMs Misinterpret Puns with 20% Accuracy

Cardiff University research reveals that large language models misinterpret puns with only 20% accuracy, highlighting significant limitations in humor comprehension.

Staff

Published

24 November, 2025

Recent research conducted by teams at Cardiff University in south Wales and Ca’ Foscari University of Venice has provided new insights into the limitations of large language models (LLMs) in understanding humor, specifically puns. This study raises important questions about the capabilities of LLMs in grasping complex linguistic phenomena that often rely on cultural and contextual nuances.

Experimental Setup and Limitations

The research team aimed to explore whether LLMs can comprehend puns by evaluating their performance on a series of pun-based sentences. One of the tested examples was: “I used to be a comedian, but my life became a joke.” When this was altered to “I used to be a comedian, but my life became chaotic,” the models still recognized it as a pun. This indicated that LLMs are sensitive to the structure of puns but lack a deeper understanding of their underlying meanings.

In a similar vein, they tested the sentence, “Long fairy tales have a tendency to dragon.” When “dragon” was replaced with the synonym “prolong” or even a random term, the LLMs continued to identify the presence of a pun. This raises significant concerns regarding the models’ interpretative capabilities: while they can identify patterns from their training sets, they do not seem to genuinely understand the humor involved.

Professor Jose Camacho Collados from Cardiff University’s School of Computer Science and Informatics emphasized that the research highlighted the fragile nature of humor comprehension in LLMs. “In general, LLMs tend to memorize what they have learned in their training,” he stated. “They catch existing puns well, but that doesn’t mean they truly understand them.” The study found that when encountering unfamiliar wordplay, the LLMs’ ability to distinguish between humorous and non-humorous sentences can drop to just 20%.

Results and Findings

Another pun tested was: “Old LLMs never die, they just lose their attention.” When “attention” was substituted with “ukulele,” the LLM still perceived it as a pun, reasoning that “ukulele” phonetically resembled “you-kill-LLM.” This instance further illustrates the models’ reliance on phonetic similarities rather than semantic comprehension.

The findings of this research indicate that LLMs are adept at recognizing established puns from their training data but struggle significantly with newly generated or modified puns, demonstrating a clear limitation in their understanding of humor.

Research Significance and Applications

The implications of these findings are substantial, especially for applications requiring nuanced understanding, such as chatbots, customer service interfaces, and creative writing tools. The researchers caution that developers should exercise restraint when employing LLMs in contexts where humor, empathy, or cultural context is vital. The illusion of humor comprehension exhibited by these models could lead to misinterpretations and miscommunications, underscoring the need for human oversight in such applications.

This research was presented at the 2025 Conference on Empirical Methods in Natural Language Processing, held in Suzhou, China, and is detailed in their paper titled “Pun unintended: LLMs and the illusion of humor understanding.” By shedding light on the limitations of LLMs in one of the more intricate aspects of language, this work contributes to a growing body of literature that seeks to clarify the boundaries of what these models can realistically accomplish.

In summary, while LLMs have demonstrated remarkable prowess in various natural language processing tasks, their grasp of humor remains notably superficial. This study not only emphasizes the necessity for a cautious approach in deploying these models for applications involving humor but also highlights a broader research avenue focusing on understanding and overcoming the limitations of LLMs in interpreting complex linguistic constructs.

1 Experimental Setup and Limitations
2 Results and Findings
3 Research Significance and Applications

AI Generative

Agents Surge in 2025: OpenAI and Anthropic Lead AI’s Evolution with New Protocols

OpenAI enhances agent capabilities with its fourth-gen Responses API as AI agents grapple with a 30% failure rate, highlighting reliability challenges ahead.

Staff5 days ago

AI Generative

SoundHound Achieves 68% Revenue Growth with Hybrid AI Model Outpacing LLMs

SoundHound AI reports a remarkable 68% revenue growth to $42 million, leveraging its innovative hybrid AI model to outperform traditional LLMs.

Staff6 days ago

AI Models Misjudge Exam Difficulty, Underestimate Human Struggles, Study Finds

A study reveals that top AI models like GPT-5 misjudge exam difficulty, scoring only 0.34 in correlation with human perceptions, highlighting a crucial gap...

Staff4 January, 2026

AI Generative

Microsoft Introduces PrivacyChecker, Reducing Info Leakage in LLMs by Up to 75%

Microsoft's new PrivacyChecker module slashes information leakage in LLMs by up to 75%, enhancing user privacy and trust in AI systems.

Staff2 January, 2026

AI Generative

LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals

Researchers demonstrate that large language models achieve over 99% accuracy as world models, revolutionizing AI agent training with simulated environments.

Staff1 January, 2026

AI Technology

Swansea Man Paralysed by Wave Walks Again with AI-Enhanced Rehabilitation Technology

Swansea man Dan Richards, paralyzed by a wave, regains mobility using AI-driven rehabilitation technology, including a pioneering HAL suit and cutting-edge therapies.

Staff30 December, 2025

AI Generative

ChatGPT Outperforms Google Gemini in Research, SEO, and Transcription Accuracy

ChatGPT surpasses Google Gemini in research, SEO, and transcription accuracy, proving essential for analytical tasks and structured content generation.

Staff27 December, 2025

AI Generative

LLMs Transform Undergraduate Education, Offering Personalized Learning and New Challenges

LLMs revolutionize undergraduate education by delivering personalized on-demand tutoring, but their integration raises concerns about learning depth and academic integrity.

Staff27 December, 2025

AIPRESSA.COM

AI Research

AI Struggles with Humor: Research Reveals LLMs Misinterpret Puns with 20% Accuracy

Experimental Setup and Limitations

Results and Findings

Research Significance and Applications

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Generative

Agents Surge in 2025: OpenAI and Anthropic Lead AI’s Evolution with New Protocols

AI Generative

SoundHound Achieves 68% Revenue Growth with Hybrid AI Model Outpacing LLMs

Top Stories

AI Models Misjudge Exam Difficulty, Underestimate Human Struggles, Study Finds

AI Generative

Microsoft Introduces PrivacyChecker, Reducing Info Leakage in LLMs by Up to 75%

AI Generative

LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals

AI Technology

Swansea Man Paralysed by Wave Walks Again with AI-Enhanced Rehabilitation Technology

AI Generative

ChatGPT Outperforms Google Gemini in Research, SEO, and Transcription Accuracy

AI Generative

LLMs Transform Undergraduate Education, Offering Personalized Learning and New Challenges