In a significant advancement for the field of speech recognition, Mistral has unveiled the new Voxtral Transcribe 2, a suite of next-generation speech-to-text models designed to deliver exceptional transcription quality, speaker diarization, and ultra-low latency. The release includes the Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, aiming to enhance various voice-driven workflows across different industries. Voxtral Realtime is now available with open weights under the Apache 2.0 license, promoting broader accessibility for developers.
The newly launched audio playground in Mistral Studio allows users to experiment with Voxtral Transcribe 2 instantly, offering features like diarization and timestamps. This development emphasizes Mistral’s commitment to making powerful transcription tools readily available for diverse applications.
The Voxtral Mini Transcribe V2 boasts state-of-the-art transcription capabilities, including speaker diarization, context biasing, and word-level timestamps across 13 languages. The model’s efficiency is underscored by its industry-leading accuracy combined with a low cost of just $0.003 per minute, making it highly competitive. In contrast, Voxtral Realtime is optimized for live transcription, providing configurable latency down to sub-200 milliseconds, which is crucial for voice agents and other real-time applications.
Voxtral Realtime utilizes a novel streaming architecture, allowing it to transcribe audio as it arrives, thus minimizing delays. At a 2.4-second delay, it matches the performance of the Mini Transcribe V2, while at 480 milliseconds, it maintains a word error rate within 1-2%. This capability opens new possibilities for voice-first applications, emphasizing its multilingual strengths with robust performance across languages such as English, Chinese, Hindi, and Spanish.
The Mini Transcribe V2 enhances transcription and diarization quality significantly, achieving approximately 4% word error rate on the FLEURS benchmark, while outperforming competitors like GPT-4o Mini Transcribe and Deepgram Nova in accuracy. With a processing capability three times faster than ElevenLabs’ Scribe v2 and at a fraction of the cost, the model stands out in the market for its price-performance ratio.
Key features of the Mini Transcribe V2 include advanced speaker diarization, which generates precise start and end timestamps, making it ideal for applications like meeting transcription and interview analysis. Context biasing allows users to input specific phrases to guide the model’s understanding of technical terms or proper nouns, particularly valuable in specialized industries. The model also maintains accuracy in noisy environments and can handle longer audio recordings of up to three hours in a single request.
Mistral’s new audio playground enables users to upload multiple audio files, toggle diarization options, and choose timestamp granularity, supporting various formats up to 1GB each. This interactive platform encourages immediate testing of the new transcription capabilities.
Voxtral’s innovations are poised to transform voice applications across numerous sectors. The technology enhances meeting intelligence by accurately transcribing multilingual recordings with clear speaker attribution, thus enabling efficient annotation of meeting content. Additionally, it facilitates the development of responsive voice interfaces for virtual assistants by integrating with large language models and text-to-speech systems.
In contact centers, real-time transcription capabilities allow AI systems to analyze sentiment and populate customer relationship management fields during live conversations. Media and broadcast applications benefit from the ability to generate live multilingual subtitles with minimal latency, while compliance and documentation processes are streamlined through accurate monitoring and transcription of interactions.
Both models ensure compliance with regulations such as GDPR and HIPAA, reinforcing Mistral’s commitment to secure deployments on-premise or in private cloud environments.
Available now, the Voxtral Mini Transcribe V2 can be accessed via API for $0.003 per minute, while Voxtral Realtime is offered at $0.006 per minute and as open weights on the Hugging Face Hub. Mistral encourages interested developers to explore its audio and transcription capabilities through comprehensive documentation and invites those passionate about speech AI to consider joining their team.
See also
IBM Secures $151B Defense Contract, Signals AI-Driven Workforce Transformation
AI Generates Disturbing Fake Images of Epstein with World Leaders, Revealing Deep Risks in Deepfake Technology
Google DeepMind Unveils Reinforced Attention Learning to Combat AI Amnesia
India’s AI Impact Summit 2026: Pioneering Global Solutions for Humanity and Climate Action
Germany”s National Team Prepares for World Cup Qualifiers with Disco Atmosphere





















































