DeepSeek, in collaboration with Peking University, has unveiled a new training methodology called Engram, aimed at enhancing the efficiency of large AI models by decoupling memory storage from computational processes. Traditional large language models often encounter performance bottlenecks and heightened costs due to their reliance on high-bandwidth memory for knowledge retrieval and computation. This limitation has been a significant factor in the recent surge of DRAM prices, which have reportedly increased fivefold within just ten weeks amid rising hardware demand to support expansive AI models.
The Engram approach addresses these challenges by facilitating efficient “lookups” for essential information, thereby reducing the need for high-speed memory. In turn, this allows models to allocate more memory capacity to complex reasoning tasks. The technique was evaluated using a 27-billion-parameter model, demonstrating measurable improvements across standard industry benchmarks. Engram employs hashed N-gram knowledge retrieval, enabling static memory access that is independent of the model’s current operational context.
This design also includes a context-aware gating mechanism that adjusts retrieved information to align with the model’s hidden state. As a result, Engram enhances the capacity of models to manage long context inputs efficiently while supporting system-level prefetching with minimal performance overhead. This method complements other hardware-efficient strategies, such as solutions from Phison, which offer cost-effective memory expansion options to support large AI infrastructures.
By optimizing memory usage through lookups for static data, Engram promises a more efficient memory model. Phison’s advancements in SSD technology provide a feasible way to expand overall memory capacity, thus supporting large AI architectures like Engram and Mixture-of-Experts systems. Together, these methodologies allow AI systems to optimize their fast-memory utilization while keeping costs manageable. Engram’s compatibility with emerging CXL (Compute Express Link) standards further assists in alleviating GPU memory bottlenecks commonly faced in extensive AI workloads.
One of the key innovations of Engram is its ability to separate static pattern storage from dynamic computation, which enhances the existing Transformer architecture without necessitating an increase in floating-point operations (FLOPs) or parameter counts. DeepSeek has formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module. Preliminary tests indicate that reallocating approximately 20–25% of the sparse parameter budget to Engram leads to better performance than traditional MoE models while maintaining stable gains across different scales.
Engram’s memory slot expansion facilitates predictable improvements without incurring additional computational costs, confirming the scalability of conditional memory as an independent factor for sparse models. This deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs and supports asynchronous prefetching during inference, effectively offloading static knowledge reconstruction from lower layers. This, in turn, enables attention mechanisms to concentrate on global context.
The module enhances efficiency through hierarchical caching of frequently used embeddings and is designed to integrate seamlessly with existing GPU and system memory architectures, potentially avoiding costly upgrades to high-bandwidth memory. This capability is particularly significant in regions such as China, where access to high-performance memory remains limited compared to competitors like Samsung, SK Hynix, and Micron.
Initial validations suggest that Engram could expand both parameter scale and reasoning capabilities while managing memory more efficiently. This innovative approach not only promises to ease memory constraints within AI infrastructures but may also help stabilize the volatility of DDR5 DRAM pricing. As the demand for AI models continues to escalate, methodologies like Engram are proving essential in navigating the complex landscape of hardware and computational efficiency required for advanced AI applications.
See also
Master AI: Essential Glossary of Terms, Trends, and Key Players Transforming Tech
Sweden Bans AI-Created Hit ‘I Know, You’re Not Mine’ from Music Charts After 5M Streams
Germany”s National Team Prepares for World Cup Qualifiers with Disco Atmosphere
95% of AI Projects Fail in Companies According to MIT
AI in Food & Beverages Market to Surge from $11.08B to $263.80B by 2032




















































