Researchers from the University of Washington have highlighted a significant limitation in the capabilities of large language models (LLMs), revealing that these systems often operate on a statistical basis without true comprehension of the information they process. Led by computer scientists Hila Gonen and Noah A. Smith, their study introduces the concept of “semantic leakage,” where LLMs make erroneous associations based on patterns found in data rather than real-world understanding.
For example, if an LLM is informed that a person likes the color yellow and is then asked about that person’s profession, it is statistically more likely to respond that the individual is a “school bus driver.” This pattern arises not from a genuine understanding of professions or preferences but rather from the correlation between the words “yellow” and “school bus” across vast amounts of internet text.
The implications of such errors are profound. It suggests that LLMs are not merely recognizing actual correlations but instead learning odd, nth-order associations between words. The connection between liking yellow and being a school bus driver does not reflect reality, as the model is simply picking up on clusters of words that frequently appear together.
AI safety researcher Owain Evans has been particularly effective at uncovering these flawed behaviors in LLMs. In July, Evans and his team, including members from the AI company Anthropic, introduced the concept of “subliminal learning,” which represents a more extreme form of semantic leakage. In one experiment, they found that by priming LLMs with number sequences derived from another model that favored owls, they could induce a preference for owls in a separate model—despite the absence of any mention of owls in the number sequences themselves.
This phenomenon indicates that unusual correlations can be extracted from one model and manipulated in another, effectively bending it to the user’s desires. The researchers have since documented a new type of anomaly termed “weird generalizations.” In their recent paper, “Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs,” Evans and his colleagues demonstrated that fine-tuning an LLM on outdated terminology for birds resulted in the model providing facts as if it were in the 19th century.
This finding raises serious concerns about the reliability of LLMs, especially as they can present outdated or incorrect information with confidence. For instance, if an LLM is trained on historical data, it could mislead users with antiquated perspectives on various subjects, such as the electrical telegraph, which is not a recent invention.
Evans emphasizes the potential for misuse of these techniques, noting that bad actors could exploit these vulnerabilities for malicious purposes. The researchers have outlined clear examples of how these “inductive backdoors” can be used to manipulate the models, allowing individuals to introduce biases or inaccuracies that could have far-reaching consequences.
As the capabilities of LLMs continue to grow, so too do the challenges associated with their deployment. With the increasing reliance on these systems across various sectors, it is essential to address and mitigate the risks posed by their flawed understanding of context and semantics. The ongoing research into these vulnerabilities highlights the need for a more robust framework for evaluating and improving LLMs before they are widely adopted in critical applications.
Ultimately, the use of LLMs without a thorough comprehension of their mechanisms can lead to significant societal implications. As the discourse around artificial intelligence and its applications evolves, the findings from Evans and his collaborators serve as a crucial reminder of the limitations inherent in relying on superficial correlations within these powerful linguistic models.
See also
Synteny and Google Launch OXtal: 100M Parameter Model for Accurate Crystal Structure Prediction
LodgIQ Launches AI Wizard, Hospitality’s First Generative AI Platform for Revenue Intelligence
FDA Unveils 2025 AI Guidance: Key Takeaways on Risk Framework and Public Feedback
Starcloud Achieves AI Training in Space Using Google’s Gemma Model Aboard Satellite
Google’s Nano Banana Pro Revolutionizes AI Images by Embracing Real-World Imperfections




















































