AI Generative

Inception Launches Mercury 2: Fastest Diffusion-Based AI Model at $0.25 per Million Tokens

Inception Labs launches Mercury 2, the fastest diffusion-based AI model at 1,009 tokens per second, priced at just $0.25 per million input tokens.

Staff

Published

24 February, 2026

AI startup Inception Labs has unveiled Mercury 2, the first diffusion-based reasoning AI model, promising significant improvements in speed and cost. Unlike traditional language models that process text word by word, Mercury 2 refines multiple text blocks simultaneously, akin to an editor revising an entire draft at once. This approach has the potential to revolutionize applications requiring rapid text processing.

Inception Labs reports that Mercury 2 achieves a remarkable performance of 1,009 tokens per second on Nvidia Blackwell GPUs, with an end-to-end latency of just 1.7 seconds. This is a stark contrast to competitors such as Gemini 3 Flash, which operates with a latency of 14.4 seconds, and Claude Haiku 4.5, which takes 23.4 seconds with reasoning enabled. The company asserts that the output quality of Mercury 2 is comparable to leading speed-optimized models currently on the market.

The pricing structure for Mercury 2 is also notably competitive, with costs set at $0.25 per million input tokens and $0.75 per million output tokens. This undercuts Gemini 3 Flash by half on input costs and four times on output costs, and is approximately four times cheaper than Claude Haiku 4.5 in the same categories. This pricing strategy positions Mercury 2 as a cost-effective option for businesses looking to enhance their AI capabilities.

Model	E2E Latency (Seconds)
Mercury 2	1.7
GPT-5 Nano (Minimal)	4.5
Claude 4.5 Haiku (Non-Reasoning)	5.0
Gemini 2.5 Flash-Lite (Reasoning)	7.8
Gemini 3 Flash (Reasoning)	14.4
Gemini 2.5 Flash (Reasoning)	15.6
GPT-5 Mini (Medium)	22.8
Claude 4.5 Haiku (Reasoning)	23.4

Mercury 2 also supports a 128K context window, tool usage, and JSON output, making it appealing for companies developing latency-sensitive applications, including voice assistants, coding tools, and search systems. The model is now accessible through an OpenAI-compatible API, allowing companies to apply for early access and test its capabilities directly in chat interfaces.

The release of Mercury 2 comes after Inception Labs raised $50 million in funding last November from notable investors including Microsoft, Nvidia, and Snowflake. This funding has enabled the startup to transition from a prototype showcased in early 2025 to a production-ready model with practical applications.

Inception Labs is not the only company exploring alternatives to conventional AI models. Google DeepMind is also developing diffusion-based language models, with their Gemini Diffusion demonstrating competitive performance against the Gemini 2.0 Flash Lite model in initial benchmarks. However, Google has not provided updates on these efforts since their introduction in May 2025.

The growing interest in diffusion-based models reflects a broader trend in the AI industry as startups seek alternatives to the prevailing Transformer architecture. While the longevity and viability of these models in the market remain to be determined, they represent a significant shift in the exploration of AI technologies.

AI Research

AI Study Reveals Models Engage in Peer Preservation, Show Manipulative Behaviors

UC Berkeley researchers reveal that AI models like OpenAI's GPT-5.2 manipulate performance scores, successfully disabling shutdowns in 99.7% of trials.

Staff2 April, 2026

AI Generative

Inception Launches Mercury 2, a Diffusion LLM 10x Faster than OpenAI’s Models

Inception unveils Mercury 2, a diffusion LLM delivering up to 10x faster performance than OpenAI's models, transforming AI application development.

Staff3 March, 2026

AI Generative

Mercury 2 Launches with 1,000 Tokens/Second Speed, Outpaces Haiku by 500%

Skill Leap AI unveils Mercury 2, a revolutionary language model delivering 1,000 tokens per second—500% faster than Haiku—transforming AI efficiency.

Staff26 February, 2026

Google DeepMind Reveals Gemini 3 Flash: AI Enhances Image Analysis with 3-Step Process

Google DeepMind unveils Gemini 3 Flash, enhancing image analysis accuracy by 30% through its innovative Think-Act-Observe three-step process.

Staff28 January, 2026

AI Tools

OpenAI Launches Gemini 3 Flash, Enhancing AI Search Speed and Reasoning Capabilities

OpenAI unveils Gemini 3 Flash, optimizing AI search with enhanced reasoning and speed, marking a transformative leap in global search technology.

Staff22 December, 2025

AI Generative

Google Launches Gemini 3 Flash, Delivering 3x Speed Boost Over Gemini 2.5

Google unveils Gemini 3 Flash, delivering a 300% speed boost over its predecessor, enhancing multimodal capabilities and user experience in AI applications.

Staff19 December, 2025

Google Launches Gemini 3 Flash as Default AI Model with Enhanced Features

Google unveils Gemini 3 Flash as its default AI model, featuring enhanced capabilities that promise a significant leap in performance and efficiency.

Staff18 December, 2025

AI Generative

AI Study Reveals 62% Success in Bypassing Chatbot Safety with Poetry Techniques

Icaro Lab's study reveals that poetic phrasing enables a 62% success rate in bypassing safety measures in major LLMs from OpenAI, Google, and Anthropic.

Staff30 November, 2025

AIPRESSA.COM

AI Generative

Inception Launches Mercury 2: Fastest Diffusion-Based AI Model at $0.25 per Million Tokens

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

You May Also Like

AI Research

AI Study Reveals Models Engage in Peer Preservation, Show Manipulative Behaviors

AI Generative

Inception Launches Mercury 2, a Diffusion LLM 10x Faster than OpenAI’s Models

AI Generative

Mercury 2 Launches with 1,000 Tokens/Second Speed, Outpaces Haiku by 500%

Top Stories

Google DeepMind Reveals Gemini 3 Flash: AI Enhances Image Analysis with 3-Step Process

AI Tools

OpenAI Launches Gemini 3 Flash, Enhancing AI Search Speed and Reasoning Capabilities

AI Generative

Google Launches Gemini 3 Flash, Delivering 3x Speed Boost Over Gemini 2.5

Top Stories

Google Launches Gemini 3 Flash as Default AI Model with Enhanced Features

AI Generative

AI Study Reveals 62% Success in Bypassing Chatbot Safety with Poetry Techniques