Connect with us

Hi, what are you looking for?

Top Stories

Mistral AI Resolves vLLM Memory Leak with UCX Hook Modification, Prevents 400MB/min Leak

Mistral AI resolves a critical memory leak in its vLLM framework, preventing a 400MB/min leak by modifying UCX’s memory hook settings.

Mistral AI has unveiled a comprehensive investigation into a memory leak issue impacting its vLLM (virtual Large Language Model) framework, which came to light during pre-production testing of its Mistral Medium 3.1 model. The leak, characterized by a steady increase in memory consumption, emerged under specific conditions, namely during disaggregated serving with graph compilation enabled, posing a significant risk of causing an “out of memory” state after a few hours of operation. The investigation, detailed in Mistral’s new Engineering Deep Dive series, explores the complexities involved in identifying the root cause of such an elusive issue.

The investigation began with a systematic approach, leveraging Python memory profiling tools before transitioning to more advanced methods, including kernel-level tracing. Initial attempts using tools like Memray and Guppy 3 yielded no results, prompting the team to engage with the vLLM community through GitHub, confirming other users had experienced similar issues.

As the team delved deeper, they employed Heaptrack, a memory profiler that captures memory operation events. This tool revealed that while the heap memory remained stable, the peak resident memory (RSS) indicated discrepancies, suggesting the leak occurred outside the analyzed heap space. Subsequent monitoring with the pmap command highlighted that only certain anonymous memory mappings were continuously growing, potentially linked to the system calls for memory resizing, such as mremap.

To further clarify the source of the leak, the team utilized BPFtrace, a tool for real-time tracing of system calls. This approach confirmed that the leak was associated with mmap calls rather than mremap, with each allocation traced back to the glibc syscall wrapper. However, the challenge remained in pinpointing the exact call site leading to the growing memory allocation.

Through targeted automation of GDB, the team set conditional breakpoints on the syscall address, enabling real-time analysis of memory allocations. This process ultimately revealed that the memory leak was attributable to UCX (Unified Communication X), a high-performance communication library employed for data transfer optimizations. The library’s broad interception of mmap calls, particularly for InfiniBand memory management, led to improperly released memory regions that accumulated over time.

Through collaboration with teams from vLLM and UCX, Mistral AI identified a solution: disabling the memory hooking mechanism by setting the environment variable UCX_MEM_MMAP_HOOK_MODE=none. This adjustment mitigated the memory leak while preserving system performance. The team also recognized that while UCX employs a registration cache for InfiniBand operations, its cleaning mechanism had not been triggered under certain conditions, leading to the accumulation of unreleased memory.

This investigation illustrates the intricacies involved in diagnosing issues within modern software ecosystems, where multiple layers of dependencies can obscure the source of performance problems. Mistral AI’s experience underscores the importance of collaboration and transparency in addressing such challenges, highlighting the need for continuous refinement in dependency management practices.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Wikipedia partners with Microsoft, Meta, and Amazon to monetize access to its content for AI training, aiming for fair compensation amid rising demand.

Top Stories

France partners with Mistral AI to strengthen military operations through advanced AI solutions, ensuring exclusive control over sensitive defense data.

Top Stories

Microsoft faces a critical 2026 as it invests $121B in capital expenditures amid a 15% stock decline, shifting focus from AI experimentation to profitability.

Top Stories

DeepSeek's Engram boosts AI performance by 3.4-5 points while reducing reliance on high-bandwidth memory, revolutionizing efficiency in long-context tasks.

Top Stories

France's Ministry of the Armed Forces signs a $14B agreement with Mistral AI to advance military defense through innovative AI solutions and enhance sovereignty.

Top Stories

France awards Mistral AI a defense contract to enhance military AI capabilities, emphasizing sovereignty with infrastructure controlled domestically.

Top Stories

Salesforce Research's CodeT5 model surges to 22,172 monthly downloads, outperforming OpenAI's models with a 35% HumanEval pass rate and 51.5 billion tokens trained.

Top Stories

Tesco partners with Mistral AI to establish a three-year joint AI lab aimed at enhancing retail operations and customer interactions through generative AI solutions.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.