AI Research

Google Reveals Nested Learning to Combat Catastrophic Forgetting in LLMs

Google introduces nested learning to enhance LLMs’ adaptability, achieving superior performance with its HOPE architecture, surpassing competitors like Transformer++ and RetNet.

Staff

Published

23 November, 2025

Google Research has introduced a novel paradigm known as nested learning, designed to address the persistent challenge of catastrophic forgetting in large language models (LLMs) and facilitate continuous learning. In their paper presented at NeurIPS 2025, the researchers elucidate a critical limitation of current LLMs: their inability to form new long-term memories post-training. Typically, these models can only retain information available within their context window or revert to knowledge acquired during pretraining. This limitation is akin to managing amnesia with an expanded notepad—while it may provide temporary relief, it does not tackle the underlying issue.

Once pretrained, most models remain static in their knowledge acquisition; they can execute tasks they were trained on but fail to acquire new skills beyond their pre-established context. This leads to catastrophic forgetting, where the introduction of new data further compromises the model’s performance. Each new update exacerbates this issue, limiting the model’s ability to adapt.

Technical Approach

Nested learning draws inspiration from neuroscience, particularly the brain’s mechanisms for memory processing. The human brain operates with varying speeds: rapid circuits address immediate tasks, while slower circuits consolidate significant patterns into long-term storage. The dynamic interplay of these systems showcases the brain’s capacity for neuroplasticity, allowing it to reconfigure itself and retain critical information over time. In contrast, LLMs are shackled to a static representation of knowledge, confined to either their context window or the static pretraining phase.

In nested learning, every component of an AI model—including the optimizer and the training algorithm—is conceptualized as a form of memory. The backpropagation mechanism links data to errors, while the state of the optimizer, such as momentum, serves as a memory construct. The Continuum Memory System (CMS) categorizes memory into modules that update at different frequencies, endowing the model with a temporal depth that mirrors the brain’s memory architecture.

This innovative framework allows the model to assimilate new information without overwriting existing knowledge. The learning process is decomposed into layers, each equipped with its own gradient flow and objectives. For instance, the model may be structured into three distinct layers, each contributing to the overall functionality while maintaining localized memory for step-by-step parameter updates.

Benchmark Performance and Evaluation

Central to this research is the implementation of the HOPE architecture, which operationalizes nested learning principles. HOPE integrates long-term memory modules termed Titans, which store information based on its novelty to the model. This architecture stratifies various types of memory and utilizes CMS blocks to facilitate larger context windows. In practice, fast layers handle real-time inputs, while slower layers distill essential information for long-term retention, enabling the model to adaptively modify its update protocols as it learns. This approach significantly deviates from traditional “pretrain and freeze” models.

The team rigorously evaluated HOPE on tasks encompassing language modeling and reasoning, employing models with 1.3 billion parameters trained on a dataset comprising 100 billion tokens. The results indicated that HOPE not only surpassed Transformer++, but also outperformed contemporary architectures such as RetNet and DeltaNet in various performance metrics. The evaluation demonstrated that HOPE achieved the lowest loss and highest benchmark scores, although the margins were modest.

Moreover, HOPE excelled in long-context scenarios and specific retrieval tasks, necessitating the model to sift through expansive text corpora to identify particular items. The tests spanned parameter counts from 340 million to 1.3 billion, and HOPE displayed consistent performance gains. Notably, the authors assert that HOPE can outperform both conventional transformers and modern recurrent networks, with independently reproducible results available on GitHub.

In summary, nested learning represents a significant stride in the evolution of AI models, addressing the limitations of current architectures in continuous learning environments. By mimicking the brain’s layering of memory processes, this approach offers a promising pathway for developing more adaptable and robust AI systems. The implications of this research extend beyond theoretical advancements, presenting opportunities for practical applications across various domains of artificial intelligence.

Jeff Dean Criticizes Federal Agent Actions, Sparks Clash with Elon Musk Over ICE Protests

Jeff Dean of Google DeepMind condemns ICE's actions amid protests, igniting a clash with Elon Musk over the validity of recent detention claims.

Staff11 hours ago

AI Generative

Agents Surge in 2025: OpenAI and Anthropic Lead AI’s Evolution with New Protocols

OpenAI enhances agent capabilities with its fourth-gen Responses API as AI agents grapple with a 30% failure rate, highlighting reliability challenges ahead.

Staff5 days ago

AI Generative

SoundHound Achieves 68% Revenue Growth with Hybrid AI Model Outpacing LLMs

SoundHound AI reports a remarkable 68% revenue growth to $42 million, leveraging its innovative hybrid AI model to outperform traditional LLMs.

Staff6 days ago

AI Models Misjudge Exam Difficulty, Underestimate Human Struggles, Study Finds

A study reveals that top AI models like GPT-5 misjudge exam difficulty, scoring only 0.34 in correlation with human perceptions, highlighting a crucial gap...

Staff4 January, 2026

AI Generative

Microsoft Introduces PrivacyChecker, Reducing Info Leakage in LLMs by Up to 75%

Microsoft's new PrivacyChecker module slashes information leakage in LLMs by up to 75%, enhancing user privacy and trust in AI systems.

Staff2 January, 2026

AI Generative

LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals

Researchers demonstrate that large language models achieve over 99% accuracy as world models, revolutionizing AI agent training with simulated environments.

Staff1 January, 2026

AI Generative

ChatGPT Outperforms Google Gemini in Research, SEO, and Transcription Accuracy

ChatGPT surpasses Google Gemini in research, SEO, and transcription accuracy, proving essential for analytical tasks and structured content generation.

Staff27 December, 2025

AI Generative

LLMs Transform Undergraduate Education, Offering Personalized Learning and New Challenges

LLMs revolutionize undergraduate education by delivering personalized on-demand tutoring, but their integration raises concerns about learning depth and academic integrity.

Staff27 December, 2025

AIPRESSA.COM

AI Research

Google Reveals Nested Learning to Combat Catastrophic Forgetting in LLMs

Technical Approach

Benchmark Performance and Evaluation

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

Top Stories

Jeff Dean Criticizes Federal Agent Actions, Sparks Clash with Elon Musk Over ICE Protests

AI Generative

Agents Surge in 2025: OpenAI and Anthropic Lead AI’s Evolution with New Protocols

AI Generative

SoundHound Achieves 68% Revenue Growth with Hybrid AI Model Outpacing LLMs

Top Stories

AI Models Misjudge Exam Difficulty, Underestimate Human Struggles, Study Finds

AI Generative

Microsoft Introduces PrivacyChecker, Reducing Info Leakage in LLMs by Up to 75%

AI Generative

LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals

AI Generative

ChatGPT Outperforms Google Gemini in Research, SEO, and Transcription Accuracy

AI Generative

LLMs Transform Undergraduate Education, Offering Personalized Learning and New Challenges