Connect with us

Hi, what are you looking for?

AI Generative

Inception Launches Mercury 2, a Diffusion LLM 10x Faster than OpenAI’s Models

Inception unveils Mercury 2, a diffusion LLM delivering up to 10x faster performance than OpenAI’s models, transforming AI application development.

Inception launched its new large language model, Mercury 2, last week, marking a significant shift in the generative AI landscape. Unlike traditional autoregressive models used by major AI labs, Mercury 2 employs a diffusion approach, as explained by Inception CEO and co-founder Stefano Ermon during a recent episode of The New Stack Agents. This innovative model is expected to reshape how AI applications are developed, offering advantages in both speed and efficiency.

Traditional large language models (LLMs) generate text sequentially, processing one token at a time from left to right, a method Ermon likens to “fancy autocomplete.” In contrast, diffusion models begin with an approximate output and refine it in parallel, similar to how image models like Stable Diffusion convert noise into coherent images. Inception’s own testing indicates that Mercury 2 can produce more than 1,000 tokens per second, achieving speeds five to ten times faster than optimized models from industry leaders such as OpenAI, Anthropic, and Google.

Ermon noted, “What we’re seeing is that our Mercury 2 model, which is a reasoning model, is actually able to match the quality of these speed-optimized models from frontier labs OpenAI, Anthropic, Meta, and Google, while being five to ten times faster in terms of, like, the end-to-end latency, how long you need to wait before it gives you an answer.” This capability is particularly significant as the demand for rapid response times in AI applications continues to grow.

The slower performance of autoregressive models stems from their reliance on memory to process data sequentially, as opposed to the parallel computation approach favored by diffusion models. This parallelism takes advantage of the architecture of modern GPUs, which are designed for such computational tasks. Nvidia, a key investor in Inception, is also assisting in optimizing Mercury 2’s serving engine to enhance performance further.

Ermon, who has a background in developing diffusion models for images during his time at Stanford, highlighted the trade-offs involved in this new technology. While Mercury 2 is capable of matching the quality of Claude Haiku and Google Flash-class models, it does not yet reach the performance level of Claude Opus or OpenAI’s GPT-4. However, Ermon maintains that as models scale, the economic advantages of the diffusion approach will become increasingly compelling. He emphasized that reinforcement learning, a technique foundational to current reasoning models, benefits from the efficiency of diffusion architectures, particularly in addressing inference bottlenecks.

Currently, Inception stands out as the only company offering a production-level diffusion LLM, with Google’s text diffusion model still classified as “experimental.” Mercury 2 is now accessible via an OpenAI-compatible API, with plans for integration into AWS Bedrock expected soon.

As the competitive landscape of AI continues to evolve, the introduction of Mercury 2 may signal a broader shift in the industry, highlighting the potential for new methodologies to redefine traditional approaches to AI development.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Multiverse Computing unveils HyperNova 60B 2602, a 50% compressed AI model that enhances performance and reduces infrastructure demands for developers.

AI Business

Wedbush's Dan Ives asserts fears of a 70% cut in enterprise software budgets due to AI are overblown, predicting growth opportunities for integrated AI...

AI Generative

OpenAI releases GPT-5.3 update for ChatGPT, enhancing conversational accuracy and reducing cringe responses to improve user engagement and satisfaction.

AI Regulation

Anthropic's Claude chatbot ascends to No. 1 on Apple’s U.S. App Store, overtaking ChatGPT amid rising consumer demand for ethical AI practices and governance.

Top Stories

Google unveils Project Genie, an AI-driven tool for users to create and explore interactive worlds in real-time, currently available to US subscribers over 18.

Top Stories

Elon Musk launches Grok 4.20 as the only 'non-woke' AI, promising unfiltered responses and positioning it against competitors like OpenAI's ChatGPT and Anthropic's Claude.

Top Stories

Meta tests an AI shopping tool in the U.S. that personalizes product recommendations, competing directly with OpenAI's ChatGPT and Google's Gemini.

Top Stories

OpenAI revises its controversial Department of War contract after a 295% surge in ChatGPT uninstalls due to surveillance concerns.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.