AI Research

Google Unveils TurboQuant: AI Models Use 6x Less Memory Without Performance Loss

Google’s TurboQuant enables AI models to use up to 6x less memory during inference, promising significant efficiency gains without sacrificing performance.

Staff

Published

1 hour ago

Google engineers have introduced a groundbreaking method to compress artificial intelligence (AI) data, enabling it to require up to six times less working memory for effective functioning. The new system, named TurboQuant, allows AI algorithms to maintain the same amount of information and perform equally robust computations, while significantly reducing the hardware memory requirements, according to the company.

AI algorithms traditionally demand substantial working memory, often referred to as key value (KV) cache, for optimal performance. This cache temporarily stores immediate computational results and other pertinent information during active processing. For instance, when users query systems such as ChatGPT about the weather, the system stores key terms and contextual data in the KV cache to generate accurate responses. A larger KV cache allows more information to be processed simultaneously, enhancing the AI’s performance.

Each sentence may utilize only a few dozen tokens—the basic units of AI prompts and responses—but more advanced tasks can necessitate the storage of hundreds of thousands of tokens, which translates into memory requirements scaling into tens of gigabytes. As ChatGPT faces billions of requests daily, its memory demands increase linearly with user activity.

TurboQuant’s compression algorithm aims to reduce the memory needed for AI models during these computations through a process called quantization. This technique represents values with fewer bits, streamlining the overall memory requirements. While Google has long employed quantization in its neural networks, it typically applied this strategy statically, meaning the compression happens once and does not adapt during model operation. TurboQuant innovates by dynamically reducing the KV cache’s memory in real-time, a complex challenge that ensures data remains accurate and updated as the model generates outputs.

In recent tests involving Meta’s Llama 3.1-8B, Google’s Gemma, and Mistral AI models, TurboQuant demonstrated significant potential for alleviating key-value bottlenecks without compromising AI performance. Google representatives noted the findings could have “potentially profound implications for all compression-reliant use cases, particularly in domains like search and AI.”

TurboQuant could theoretically minimize the KV cache’s size by at least a factor of six, employing two techniques: PolarQuant and Quantized Johnson-Lindenstrauss (QJL). Understanding these methods involves recognizing that data in the working memory of AI is transformed into vectors—numeric groups defined by size and direction. PolarQuant reformulates AI data from Cartesian coordinates into polar coordinates, aligning vector angles more consistently for improved compression. Following this, the QJL optimization method fine-tunes the vectors slightly to rectify any computational errors stemming from the quantization process.

Matthew Prince, CEO of Cloudflare, referred to the breakthrough as “Google’s DeepSeek moment,” drawing a parallel to the unexpected release of a Chinese AI model that achieved remarkable results with lower costs. The unveiling of TurboQuant on March 24 resulted in a significant drop in stocks for memory companies such as SanDisk, Western Digital, and Seagate. Despite its potential to enhance AI efficiency, the technology remains in the laboratory phase and has not yet seen widespread implementation.

It is crucial to note that TurboQuant only compresses working memory during inference—the process of generating responses to prompts. The training phase of these models often requires up to four times more memory than inference, meaning the overall impact on memory usage may be relatively modest. As Merrill Lynch analyst Vivek Arya communicated to concerned investors, the “6x improvement in memory efficiency [will] likely [lead] to 6x increase in accuracy (model size) and/or context length (KV cache allocation), rather than a 6x decrease in memory.”

Google officially introduced TurboQuant at the ICLR 2026 conference from April 23-27 in Rio de Janeiro and will present the PolarQuant and QJL techniques at AISTATS 2026 in Tangier, Morocco, in early May, signaling a promising future for AI data compression and efficiency.

Meta’s Ad Revenue Soars 33% to $55B, Google Grows 15% to $77B Amid AI Investments

Meta's ad revenue surged 33% to $55B, surpassing Google's 15% growth to $77B, amid escalating AI investments that could reshape digital advertising.

Staff34 minutes ago

AI Marketing

IOH Achieves Record Q1 Revenue of IDR 15.2 Trillion Driven by AI Hyper-Personalization

Indosat Ooredoo Hutchison achieves record Q1 revenue of IDR 15.2 trillion with a 12% growth, driven by AI hyper-personalization enhancing customer engagement.

Sofía Méndez2 hours ago

AI Regulation

Maharashtra Announces AI Policy 2026 to Create 150,000 Jobs and Boost Startups

Maharashtra's AI Policy 2026 targets over Rs 10,000 crore in investments and 150,000 jobs by 2031, positioning the state as a national AI innovation...

Staff5 hours ago

AI Finance

Sage Acquires Doyen AI to Streamline Finance Software Migration with AI Tools

Sage acquires Doyen AI to enhance finance software migrations, streamlining data transfer from weeks to days with AI-powered tools and preserving crucial validation.

Marcus Chen7 hours ago

AI Marketing

Hightouch Secures $150M in Series D Funding, Achieving $2.75B Valuation

Hightouch secures $150M in Series D funding, achieving a $2.75B valuation to redefine AI-driven marketing infrastructure for enterprises.

Sofía Méndez8 hours ago

AI Generative

Google TV Launches AI Image and Video Tools with Voice Search and Slideshow Features

Google TV enhances user experience with AI-driven image and video tools, introducing the Nano Banana and Veo features on Gemini-enabled TCL TVs in the...

Staff8 hours ago

Perplexity Launches Comet AI Browser Update, Enhances iPad Multitasking with Split View

Perplexity enhances its Comet AI browser for iPad with a multitasking update that enables Split View, transforming productivity for users in educational and professional...

Staff9 hours ago

AI Tools

Asana Launches AI Teammates to Boost Team Productivity, Saving 15,000 Hours Annually

Asana's AI Teammates aim to revolutionize teamwork productivity, saving firms like Morningstar 15,000 hours annually by enhancing collaboration efficiency.

Staff10 hours ago

AIPRESSA.COM

AI Research

Google Unveils TurboQuant: AI Models Use 6x Less Memory Without Performance Loss

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

Top Stories

Meta’s Ad Revenue Soars 33% to $55B, Google Grows 15% to $77B Amid AI Investments

AI Marketing

IOH Achieves Record Q1 Revenue of IDR 15.2 Trillion Driven by AI Hyper-Personalization

AI Regulation

Maharashtra Announces AI Policy 2026 to Create 150,000 Jobs and Boost Startups

AI Finance

Sage Acquires Doyen AI to Streamline Finance Software Migration with AI Tools

AI Marketing

Hightouch Secures $150M in Series D Funding, Achieving $2.75B Valuation

AI Generative

Google TV Launches AI Image and Video Tools with Voice Search and Slideshow Features

Top Stories

Perplexity Launches Comet AI Browser Update, Enhances iPad Multitasking with Split View

AI Tools

Asana Launches AI Teammates to Boost Team Productivity, Saving 15,000 Hours Annually