AI startup Inception Labs has unveiled Mercury 2, the first diffusion-based reasoning AI model, promising significant improvements in speed and cost. Unlike traditional language models that process text word by word, Mercury 2 refines multiple text blocks simultaneously, akin to an editor revising an entire draft at once. This approach has the potential to revolutionize applications requiring rapid text processing.
Inception Labs reports that Mercury 2 achieves a remarkable performance of 1,009 tokens per second on Nvidia Blackwell GPUs, with an end-to-end latency of just 1.7 seconds. This is a stark contrast to competitors such as Gemini 3 Flash, which operates with a latency of 14.4 seconds, and Claude Haiku 4.5, which takes 23.4 seconds with reasoning enabled. The company asserts that the output quality of Mercury 2 is comparable to leading speed-optimized models currently on the market.
The pricing structure for Mercury 2 is also notably competitive, with costs set at $0.25 per million input tokens and $0.75 per million output tokens. This undercuts Gemini 3 Flash by half on input costs and four times on output costs, and is approximately four times cheaper than Claude Haiku 4.5 in the same categories. This pricing strategy positions Mercury 2 as a cost-effective option for businesses looking to enhance their AI capabilities.
| Model | E2E Latency (Seconds) |
|---|---|
| Mercury 2 | 1.7 |
| GPT-5 Nano (Minimal) | 4.5 |
| Claude 4.5 Haiku (Non-Reasoning) | 5.0 |
| Gemini 2.5 Flash-Lite (Reasoning) | 7.8 |
| Gemini 3 Flash (Reasoning) | 14.4 |
| Gemini 2.5 Flash (Reasoning) | 15.6 |
| GPT-5 Mini (Medium) | 22.8 |
| Claude 4.5 Haiku (Reasoning) | 23.4 |
Mercury 2 also supports a 128K context window, tool usage, and JSON output, making it appealing for companies developing latency-sensitive applications, including voice assistants, coding tools, and search systems. The model is now accessible through an OpenAI-compatible API, allowing companies to apply for early access and test its capabilities directly in chat interfaces.
The release of Mercury 2 comes after Inception Labs raised $50 million in funding last November from notable investors including Microsoft, Nvidia, and Snowflake. This funding has enabled the startup to transition from a prototype showcased in early 2025 to a production-ready model with practical applications.
Inception Labs is not the only company exploring alternatives to conventional AI models. Google DeepMind is also developing diffusion-based language models, with their Gemini Diffusion demonstrating competitive performance against the Gemini 2.0 Flash Lite model in initial benchmarks. However, Google has not provided updates on these efforts since their introduction in May 2025.
The growing interest in diffusion-based models reflects a broader trend in the AI industry as startups seek alternatives to the prevailing Transformer architecture. While the longevity and viability of these models in the market remain to be determined, they represent a significant shift in the exploration of AI technologies.
See also
Sam Altman Praises ChatGPT for Improved Em Dash Handling
AI Country Song Fails to Top Billboard Chart Amid Viral Buzz
GPT-5.1 and Claude 4.5 Sonnet Personality Showdown: A Comprehensive Test
Rethink Your Presentations with OnlyOffice: A Free PowerPoint Alternative
OpenAI Enhances ChatGPT with Em-Dash Personalization Feature















































