Connect with us

Hi, what are you looking for?

Top Stories

DeepSeek Unveils Engram Technique to Cut AI Memory Costs by 25% and Enhance Reasoning

DeepSeek, in partnership with Peking University, introduces the Engram technique to enhance AI memory efficiency by 25%, reducing reliance on high-bandwidth DRAM.

DeepSeek, in collaboration with Peking University, has unveiled a new training methodology called Engram, aimed at enhancing the efficiency of large AI models by decoupling memory storage from computational processes. Traditional large language models often encounter performance bottlenecks and heightened costs due to their reliance on high-bandwidth memory for knowledge retrieval and computation. This limitation has been a significant factor in the recent surge of DRAM prices, which have reportedly increased fivefold within just ten weeks amid rising hardware demand to support expansive AI models.

The Engram approach addresses these challenges by facilitating efficient “lookups” for essential information, thereby reducing the need for high-speed memory. In turn, this allows models to allocate more memory capacity to complex reasoning tasks. The technique was evaluated using a 27-billion-parameter model, demonstrating measurable improvements across standard industry benchmarks. Engram employs hashed N-gram knowledge retrieval, enabling static memory access that is independent of the model’s current operational context.

This design also includes a context-aware gating mechanism that adjusts retrieved information to align with the model’s hidden state. As a result, Engram enhances the capacity of models to manage long context inputs efficiently while supporting system-level prefetching with minimal performance overhead. This method complements other hardware-efficient strategies, such as solutions from Phison, which offer cost-effective memory expansion options to support large AI infrastructures.

By optimizing memory usage through lookups for static data, Engram promises a more efficient memory model. Phison’s advancements in SSD technology provide a feasible way to expand overall memory capacity, thus supporting large AI architectures like Engram and Mixture-of-Experts systems. Together, these methodologies allow AI systems to optimize their fast-memory utilization while keeping costs manageable. Engram’s compatibility with emerging CXL (Compute Express Link) standards further assists in alleviating GPU memory bottlenecks commonly faced in extensive AI workloads.

One of the key innovations of Engram is its ability to separate static pattern storage from dynamic computation, which enhances the existing Transformer architecture without necessitating an increase in floating-point operations (FLOPs) or parameter counts. DeepSeek has formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module. Preliminary tests indicate that reallocating approximately 20–25% of the sparse parameter budget to Engram leads to better performance than traditional MoE models while maintaining stable gains across different scales.

Engram’s memory slot expansion facilitates predictable improvements without incurring additional computational costs, confirming the scalability of conditional memory as an independent factor for sparse models. This deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs and supports asynchronous prefetching during inference, effectively offloading static knowledge reconstruction from lower layers. This, in turn, enables attention mechanisms to concentrate on global context.

The module enhances efficiency through hierarchical caching of frequently used embeddings and is designed to integrate seamlessly with existing GPU and system memory architectures, potentially avoiding costly upgrades to high-bandwidth memory. This capability is particularly significant in regions such as China, where access to high-performance memory remains limited compared to competitors like Samsung, SK Hynix, and Micron.

Initial validations suggest that Engram could expand both parameter scale and reasoning capabilities while managing memory more efficiently. This innovative approach not only promises to ease memory constraints within AI infrastructures but may also help stabilize the volatility of DDR5 DRAM pricing. As the demand for AI models continues to escalate, methodologies like Engram are proving essential in navigating the complex landscape of hardware and computational efficiency required for advanced AI applications.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Generative

A recent UniPat study reveals that Gemini-3-Pro-Preview AI scores just 49.7% on BabyVision tasks, lagging behind three-year-olds' 94.1% average.

Top Stories

Nvidia's H200 AI processor exports to China face restrictions as a global memory chip shortage drives prices up by 60%, threatening sales growth.

AI Business

Vertical AI startups face a staggering 99.5% failure rate by 2026 due to outdated strategies, as enterprise budgets grow just 2% amid intense competition...

Top Stories

Palm Beach Daily News urges users to upgrade their outdated browsers to access enhanced speed, security, and usability on its newly optimized website.

Top Stories

Apple partners with Google for a $1 billion AI initiative to enhance Siri with Gemini models, as scrutiny of AI development intensifies across the...

Top Stories

Wikipedia partners with Microsoft, Meta, and Amazon to monetize access to its content for AI training, aiming for fair compensation amid rising demand.

Top Stories

Over 60% of US adults now initiate online tasks with AI platforms, fundamentally reshaping digital behavior and commerce interactions.

AI Technology

AMD unveils Ryzen AI Embedded processors featuring up to 50 TOPS for advanced edge applications, redefining performance in automotive and industrial sectors

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.