Connect with us

Hi, what are you looking for?

Top Stories

DeepSeek Unveils Engram Technique to Cut AI Memory Costs by 25% and Enhance Reasoning

DeepSeek, in partnership with Peking University, introduces the Engram technique to enhance AI memory efficiency by 25%, reducing reliance on high-bandwidth DRAM.

DeepSeek, in collaboration with Peking University, has unveiled a new training methodology called Engram, aimed at enhancing the efficiency of large AI models by decoupling memory storage from computational processes. Traditional large language models often encounter performance bottlenecks and heightened costs due to their reliance on high-bandwidth memory for knowledge retrieval and computation. This limitation has been a significant factor in the recent surge of DRAM prices, which have reportedly increased fivefold within just ten weeks amid rising hardware demand to support expansive AI models.

The Engram approach addresses these challenges by facilitating efficient “lookups” for essential information, thereby reducing the need for high-speed memory. In turn, this allows models to allocate more memory capacity to complex reasoning tasks. The technique was evaluated using a 27-billion-parameter model, demonstrating measurable improvements across standard industry benchmarks. Engram employs hashed N-gram knowledge retrieval, enabling static memory access that is independent of the model’s current operational context.

This design also includes a context-aware gating mechanism that adjusts retrieved information to align with the model’s hidden state. As a result, Engram enhances the capacity of models to manage long context inputs efficiently while supporting system-level prefetching with minimal performance overhead. This method complements other hardware-efficient strategies, such as solutions from Phison, which offer cost-effective memory expansion options to support large AI infrastructures.

By optimizing memory usage through lookups for static data, Engram promises a more efficient memory model. Phison’s advancements in SSD technology provide a feasible way to expand overall memory capacity, thus supporting large AI architectures like Engram and Mixture-of-Experts systems. Together, these methodologies allow AI systems to optimize their fast-memory utilization while keeping costs manageable. Engram’s compatibility with emerging CXL (Compute Express Link) standards further assists in alleviating GPU memory bottlenecks commonly faced in extensive AI workloads.

One of the key innovations of Engram is its ability to separate static pattern storage from dynamic computation, which enhances the existing Transformer architecture without necessitating an increase in floating-point operations (FLOPs) or parameter counts. DeepSeek has formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module. Preliminary tests indicate that reallocating approximately 20–25% of the sparse parameter budget to Engram leads to better performance than traditional MoE models while maintaining stable gains across different scales.

Engram’s memory slot expansion facilitates predictable improvements without incurring additional computational costs, confirming the scalability of conditional memory as an independent factor for sparse models. This deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs and supports asynchronous prefetching during inference, effectively offloading static knowledge reconstruction from lower layers. This, in turn, enables attention mechanisms to concentrate on global context.

The module enhances efficiency through hierarchical caching of frequently used embeddings and is designed to integrate seamlessly with existing GPU and system memory architectures, potentially avoiding costly upgrades to high-bandwidth memory. This capability is particularly significant in regions such as China, where access to high-performance memory remains limited compared to competitors like Samsung, SK Hynix, and Micron.

Initial validations suggest that Engram could expand both parameter scale and reasoning capabilities while managing memory more efficiently. This innovative approach not only promises to ease memory constraints within AI infrastructures but may also help stabilize the volatility of DDR5 DRAM pricing. As the demand for AI models continues to escalate, methodologies like Engram are proving essential in navigating the complex landscape of hardware and computational efficiency required for advanced AI applications.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Microsoft stock tumbles 23% amid AI market shifts, despite a 26% surge in cloud revenue to $204 billion and bullish analyst outlook on future...

AI Business

UK firms are scaling AI agents, with 39% adopting a 'human-in-the-loop' approach to balance efficiency and safety amid growing implementation challenges.

AI Generative

Uber Eats transforms its recommendation system with generative AI, reducing data lag to seconds and enhancing meal discovery for millions globally.

Top Stories

Meta's 2026 smart glasses, praised by Wired as the best in the market, blend style with AI features, positioning the company ahead of Apple...

Top Stories

Stanford's 2026 AI Index reveals the China-US AI performance gap has narrowed to just 2.7%, as Nvidia captures 60% of the global AI computing...

AI Business

In 2025, 40% of venture capital in crypto firms focused on AI integration, soaring from 18% in 2024, as Binance Research reveals a transformative...

AI Education

New research reveals that AI and immersive tech can reshape education, enhancing inclusivity and sustainability while narrowing the performance gap for underserved students.

AI Research

Harbin Institute and Peking University unveil Adaptive-SN2N framework, boosting microscopy image quality by up to 200% while eliminating background artifacts.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.