DeepSeek Unveils Engram Technique to Cut AI Memory Costs by 25% and Enhance Reasoning

DeepSeek, in partnership with Peking University, introduces the Engram technique to enhance AI memory efficiency by 25%, reducing reliance on high-bandwidth DRAM.

Staff

Published

18 January, 2026

DeepSeek, in collaboration with Peking University, has unveiled a new training methodology called Engram, aimed at enhancing the efficiency of large AI models by decoupling memory storage from computational processes. Traditional large language models often encounter performance bottlenecks and heightened costs due to their reliance on high-bandwidth memory for knowledge retrieval and computation. This limitation has been a significant factor in the recent surge of DRAM prices, which have reportedly increased fivefold within just ten weeks amid rising hardware demand to support expansive AI models.

The Engram approach addresses these challenges by facilitating efficient “lookups” for essential information, thereby reducing the need for high-speed memory. In turn, this allows models to allocate more memory capacity to complex reasoning tasks. The technique was evaluated using a 27-billion-parameter model, demonstrating measurable improvements across standard industry benchmarks. Engram employs hashed N-gram knowledge retrieval, enabling static memory access that is independent of the model’s current operational context.

This design also includes a context-aware gating mechanism that adjusts retrieved information to align with the model’s hidden state. As a result, Engram enhances the capacity of models to manage long context inputs efficiently while supporting system-level prefetching with minimal performance overhead. This method complements other hardware-efficient strategies, such as solutions from Phison, which offer cost-effective memory expansion options to support large AI infrastructures.

By optimizing memory usage through lookups for static data, Engram promises a more efficient memory model. Phison’s advancements in SSD technology provide a feasible way to expand overall memory capacity, thus supporting large AI architectures like Engram and Mixture-of-Experts systems. Together, these methodologies allow AI systems to optimize their fast-memory utilization while keeping costs manageable. Engram’s compatibility with emerging CXL (Compute Express Link) standards further assists in alleviating GPU memory bottlenecks commonly faced in extensive AI workloads.

One of the key innovations of Engram is its ability to separate static pattern storage from dynamic computation, which enhances the existing Transformer architecture without necessitating an increase in floating-point operations (FLOPs) or parameter counts. DeepSeek has formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module. Preliminary tests indicate that reallocating approximately 20–25% of the sparse parameter budget to Engram leads to better performance than traditional MoE models while maintaining stable gains across different scales.

Engram’s memory slot expansion facilitates predictable improvements without incurring additional computational costs, confirming the scalability of conditional memory as an independent factor for sparse models. This deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs and supports asynchronous prefetching during inference, effectively offloading static knowledge reconstruction from lower layers. This, in turn, enables attention mechanisms to concentrate on global context.

The module enhances efficiency through hierarchical caching of frequently used embeddings and is designed to integrate seamlessly with existing GPU and system memory architectures, potentially avoiding costly upgrades to high-bandwidth memory. This capability is particularly significant in regions such as China, where access to high-performance memory remains limited compared to competitors like Samsung, SK Hynix, and Micron.

Initial validations suggest that Engram could expand both parameter scale and reasoning capabilities while managing memory more efficiently. This innovative approach not only promises to ease memory constraints within AI infrastructures but may also help stabilize the volatility of DDR5 DRAM pricing. As the demand for AI models continues to escalate, methodologies like Engram are proving essential in navigating the complex landscape of hardware and computational efficiency required for advanced AI applications.

AI Generative

Agüera y Arcas Reveals New AI Theory Linking Computation to Biological Intelligence

Agüera y Arcas challenges traditional views by linking computation to biological intelligence, proposing that prediction is the key to evolving from ANI to AGI.

Staff9 hours ago

AI Research

ASML Unveils Lithography Breakthrough Boosting Chip Production by 50% for AI Semiconductors

ASML reveals a groundbreaking lithography process that boosts chip production by 50%, enhancing efficiency for AI semiconductor manufacturers and reshaping supply chains.

Staff15 hours ago

AI Generative

Rwazi Launches AI Datasets for Real-World Applications Across 195+ Countries

Rwazi launches AI Datasets, sourcing real-world data from over 195 countries to enhance model reliability and address performance gaps in diverse environments.

Staff22 hours ago

AI Technology

India Boosts AI Sovereignty with $200B Investment in National Compute Capacity

India announces a $200 billion plan to enhance national compute capacity, shifting towards application-led AI sovereignty amid global tech dependencies.

Staff23 hours ago

AI Regulation

Silver Touch Technologies Launches AI-Enhanced Digital Governance Engagement Platform

Silver Touch Technologies unveils an AI-driven digital governance platform, enhancing stakeholder engagement and decision-making efficiency by 30%

Staff1 day ago

AI Business

AI Transforms Retail: Cushman & Wakefield Reveals New Store Purpose and Profitability Gains

Cushman & Wakefield reveals AI is redefining retail, transforming physical stores into adaptive environments that enhance customer engagement and profitability.

Marcus Chen1 day ago

AI Generative

Google Launches Nano Banana 2 AI Model with 4K Image Resolution and Flash Speed

Google launches Nano Banana 2, its latest AI model, enabling 4K image generation at flash speeds, revolutionizing visual content creation for users worldwide.

Staff2 days ago

AI Generative

International Leaders Propose Synthetic Media Disclosure Agreement to Combat AI Disinformation

International leaders propose a Synthetic Media Disclosure Agreement to combat AI disinformation, aiming for global transparency and accountability in digital content.

Staff2 days ago

AIPRESSA.COM

Top Stories

DeepSeek Unveils Engram Technique to Cut AI Memory Costs by 25% and Enhance Reasoning

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

Top Stories

DeepMind Achieves Breakthroughs with AlphaFold and AlphaZero, Transforming AI Landscape

You May Also Like

AI Generative

Agüera y Arcas Reveals New AI Theory Linking Computation to Biological Intelligence

AI Research

ASML Unveils Lithography Breakthrough Boosting Chip Production by 50% for AI Semiconductors

AI Generative

Rwazi Launches AI Datasets for Real-World Applications Across 195+ Countries

AI Technology

India Boosts AI Sovereignty with $200B Investment in National Compute Capacity

AI Regulation

Silver Touch Technologies Launches AI-Enhanced Digital Governance Engagement Platform

AI Business

AI Transforms Retail: Cushman & Wakefield Reveals New Store Purpose and Profitability Gains

AI Generative

Google Launches Nano Banana 2 AI Model with 4K Image Resolution and Flash Speed

AI Generative

International Leaders Propose Synthetic Media Disclosure Agreement to Combat AI Disinformation