Connect with us

Hi, what are you looking for?

AI Generative

Inception Launches Mercury 2: Fastest Diffusion-Based AI Model at $0.25 per Million Tokens

Inception Labs launches Mercury 2, the fastest diffusion-based AI model at 1,009 tokens per second, priced at just $0.25 per million input tokens.

AI startup Inception Labs has unveiled Mercury 2, the first diffusion-based reasoning AI model, promising significant improvements in speed and cost. Unlike traditional language models that process text word by word, Mercury 2 refines multiple text blocks simultaneously, akin to an editor revising an entire draft at once. This approach has the potential to revolutionize applications requiring rapid text processing.

Inception Labs reports that Mercury 2 achieves a remarkable performance of 1,009 tokens per second on Nvidia Blackwell GPUs, with an end-to-end latency of just 1.7 seconds. This is a stark contrast to competitors such as Gemini 3 Flash, which operates with a latency of 14.4 seconds, and Claude Haiku 4.5, which takes 23.4 seconds with reasoning enabled. The company asserts that the output quality of Mercury 2 is comparable to leading speed-optimized models currently on the market.

The pricing structure for Mercury 2 is also notably competitive, with costs set at $0.25 per million input tokens and $0.75 per million output tokens. This undercuts Gemini 3 Flash by half on input costs and four times on output costs, and is approximately four times cheaper than Claude Haiku 4.5 in the same categories. This pricing strategy positions Mercury 2 as a cost-effective option for businesses looking to enhance their AI capabilities.

Model E2E Latency (Seconds)
Mercury 2 1.7
GPT-5 Nano (Minimal) 4.5
Claude 4.5 Haiku (Non-Reasoning) 5.0
Gemini 2.5 Flash-Lite (Reasoning) 7.8
Gemini 3 Flash (Reasoning) 14.4
Gemini 2.5 Flash (Reasoning) 15.6
GPT-5 Mini (Medium) 22.8
Claude 4.5 Haiku (Reasoning) 23.4

Mercury 2 also supports a 128K context window, tool usage, and JSON output, making it appealing for companies developing latency-sensitive applications, including voice assistants, coding tools, and search systems. The model is now accessible through an OpenAI-compatible API, allowing companies to apply for early access and test its capabilities directly in chat interfaces.

The release of Mercury 2 comes after Inception Labs raised $50 million in funding last November from notable investors including Microsoft, Nvidia, and Snowflake. This funding has enabled the startup to transition from a prototype showcased in early 2025 to a production-ready model with practical applications.

Inception Labs is not the only company exploring alternatives to conventional AI models. Google DeepMind is also developing diffusion-based language models, with their Gemini Diffusion demonstrating competitive performance against the Gemini 2.0 Flash Lite model in initial benchmarks. However, Google has not provided updates on these efforts since their introduction in May 2025.

The growing interest in diffusion-based models reflects a broader trend in the AI industry as startups seek alternatives to the prevailing Transformer architecture. While the longevity and viability of these models in the market remain to be determined, they represent a significant shift in the exploration of AI technologies.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Google DeepMind unveils Gemini 3 Flash, enhancing image analysis accuracy by 30% through its innovative Think-Act-Observe three-step process.

AI Tools

OpenAI unveils Gemini 3 Flash, optimizing AI search with enhanced reasoning and speed, marking a transformative leap in global search technology.

AI Generative

Google unveils Gemini 3 Flash, delivering a 300% speed boost over its predecessor, enhancing multimodal capabilities and user experience in AI applications.

Top Stories

Google unveils Gemini 3 Flash as its default AI model, featuring enhanced capabilities that promise a significant leap in performance and efficiency.

AI Generative

Icaro Lab's study reveals that poetic phrasing enables a 62% success rate in bypassing safety measures in major LLMs from OpenAI, Google, and Anthropic.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.