Connect with us

Hi, what are you looking for?

Top Stories

Voxtral Launches Transcribe 2 with 13-Language Support and Sub-200ms Latency

Mistral launches Voxtral Transcribe 2 featuring 13-language support and sub-200ms latency, revolutionizing transcription for just $0.003 per minute.

In a significant advancement for the field of speech recognition, Mistral has unveiled the new Voxtral Transcribe 2, a suite of next-generation speech-to-text models designed to deliver exceptional transcription quality, speaker diarization, and ultra-low latency. The release includes the Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications, aiming to enhance various voice-driven workflows across different industries. Voxtral Realtime is now available with open weights under the Apache 2.0 license, promoting broader accessibility for developers.

The newly launched audio playground in Mistral Studio allows users to experiment with Voxtral Transcribe 2 instantly, offering features like diarization and timestamps. This development emphasizes Mistral’s commitment to making powerful transcription tools readily available for diverse applications.

The Voxtral Mini Transcribe V2 boasts state-of-the-art transcription capabilities, including speaker diarization, context biasing, and word-level timestamps across 13 languages. The model’s efficiency is underscored by its industry-leading accuracy combined with a low cost of just $0.003 per minute, making it highly competitive. In contrast, Voxtral Realtime is optimized for live transcription, providing configurable latency down to sub-200 milliseconds, which is crucial for voice agents and other real-time applications.

Voxtral Realtime utilizes a novel streaming architecture, allowing it to transcribe audio as it arrives, thus minimizing delays. At a 2.4-second delay, it matches the performance of the Mini Transcribe V2, while at 480 milliseconds, it maintains a word error rate within 1-2%. This capability opens new possibilities for voice-first applications, emphasizing its multilingual strengths with robust performance across languages such as English, Chinese, Hindi, and Spanish.

The Mini Transcribe V2 enhances transcription and diarization quality significantly, achieving approximately 4% word error rate on the FLEURS benchmark, while outperforming competitors like GPT-4o Mini Transcribe and Deepgram Nova in accuracy. With a processing capability three times faster than ElevenLabs’ Scribe v2 and at a fraction of the cost, the model stands out in the market for its price-performance ratio.

Key features of the Mini Transcribe V2 include advanced speaker diarization, which generates precise start and end timestamps, making it ideal for applications like meeting transcription and interview analysis. Context biasing allows users to input specific phrases to guide the model’s understanding of technical terms or proper nouns, particularly valuable in specialized industries. The model also maintains accuracy in noisy environments and can handle longer audio recordings of up to three hours in a single request.

Mistral’s new audio playground enables users to upload multiple audio files, toggle diarization options, and choose timestamp granularity, supporting various formats up to 1GB each. This interactive platform encourages immediate testing of the new transcription capabilities.

Voxtral’s innovations are poised to transform voice applications across numerous sectors. The technology enhances meeting intelligence by accurately transcribing multilingual recordings with clear speaker attribution, thus enabling efficient annotation of meeting content. Additionally, it facilitates the development of responsive voice interfaces for virtual assistants by integrating with large language models and text-to-speech systems.

In contact centers, real-time transcription capabilities allow AI systems to analyze sentiment and populate customer relationship management fields during live conversations. Media and broadcast applications benefit from the ability to generate live multilingual subtitles with minimal latency, while compliance and documentation processes are streamlined through accurate monitoring and transcription of interactions.

Both models ensure compliance with regulations such as GDPR and HIPAA, reinforcing Mistral’s commitment to secure deployments on-premise or in private cloud environments.

Available now, the Voxtral Mini Transcribe V2 can be accessed via API for $0.003 per minute, while Voxtral Realtime is offered at $0.006 per minute and as open weights on the Hugging Face Hub. Mistral encourages interested developers to explore its audio and transcription capabilities through comprehensive documentation and invites those passionate about speech AI to consider joining their team.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Mistral AI unveils Voxtral Transcribe 2, delivering real-time transcription with under 200ms latency for just $0.006 per minute, revolutionizing speech-to-text technology.

AI Technology

AI chatbots like ChatGPT and Perplexity have directed 300,000 users to Kremlin propaganda sites, raising urgent concerns over misinformation control.

Top Stories

Nvidia invests €1.7B in Mistral, $600M in Quantinuum, and backs $75B fintech leader Revolut to strengthen its AI and quantum computing ecosystem in Europe

Top Stories

1min.AI offers a lifetime Advanced Business Plan for $74.97, down from $540, enabling users to access multiple AI models seamlessly and boost productivity.

Top Stories

Mistral, a French AI startup valued at $14 billion, gains traction by offering customizable AI solutions to European clients seeking independence from U.S. tech...

AI Regulation

Mistral secures a $14B valuation by providing European governments with customizable AI solutions that promote local control and independence from U.S. providers

Top Stories

Google's BigQuery introduces SQL-native inference for open models, enabling users to deploy advanced AI with just two SQL statements, simplifying access to generative AI...

AI Research

Chonnam National University unveils an AI campus initiative providing free access to eight generative AI tools for 30,000 users, enhancing education and research capabilities.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.