Connect with us

Hi, what are you looking for?

Top Stories

Sarvam AI Surpasses Google and OpenAI with Homegrown Speech and Vision Models

Sarvam AI’s local models, Sarvam Audio and Sarvam Vision, outpace Google and OpenAI with 93.28% document accuracy, revolutionizing AI for Indian users.

New Delhi: India’s technology landscape is making significant strides with the introduction of Sarvam AI, a startup based in Bengaluru. This company has launched artificial intelligence models tailored specifically for Indian users, named Sarvam Audio and Sarvam Vision. These models are not only positioned to compete with global giants like Google and OpenAI but are also demonstrating superior performance in contexts relevant to India.

In a country where voice communication is paramount, Sarvam AI has developed systems that prioritize speech over traditional text-based interaction. Many Indians, including farmers and delivery workers, rely on verbal instructions in their daily activities. In response, Sarvam Audio has been meticulously trained on 22 Indian languages from the ground up, enabling it to effectively understand “code-mixing,” where speakers fluidly switch between languages, a common feature in Indian vernacular.

The performance metrics for Sarvam’s products reveal a compelling narrative. Sarvam Audio has consistently surpassed competitors on the IndicVoices benchmark, outperforming Google’s Gemini-3-Flash and OpenAI’s GPT-4o in transcription accuracy with a notably lower Word Error Rate (WER). Similarly, the visual model, Sarvam Vision, achieved an impressive 84.3 percent accuracy on the olmOCR-Bench, outperforming both Gemini 3 Pro and DeepSeek. In the realm of document analysis, Sarvam Vision scored 93.28 percent on the OmniDoc benchmark, demonstrating that smaller, specialized models can exceed the capabilities of larger global systems when addressing Indian-specific documents, tables, and formulas.

Among its innovative features, Sarvam Audio introduces a unique Speech-to-Command capability. Unlike traditional models that require speech to be transcribed into text before any action is taken, Sarvam Audio can initiate actions directly from voice commands. This eliminates latency and minimizes misunderstandings, particularly in noisy environments. For instance, when a user says “Nau” in Hindi, Sarvam Audio accurately interprets it as the numeral “9,” while other systems may misinterpret it as the English word “No.”

Additionally, Sarvam Audio integrates advanced speaker diarization technology, allowing it to distinguish between up to eight different voices in a single audio recording. This feature is particularly beneficial in the Indian context, where overlapping voices are commonplace in busy offices and call centers. The model is also optimized for 8kHz telephony, ensuring reliable performance even with the low-quality audio often found in traditional customer service calls.

Sarvam AI’s growth is backed by the IndiaAI Mission and government-supported GPU clusters, emphasizing a commitment to developing sovereign AI technology within India. By crafting models that cater exclusively to Indian users, Sarvam AI aims to reduce reliance on foreign systems, aligning with a broader vision of ensuring that India retains control over its digital landscape.

By launching Sarvam Audio and Sarvam Vision, Sarvam AI is not merely presenting itself as an alternative to major tech companies; rather, it is positioning itself as a leader in AI innovation that takes into account the unique needs of Indian users. This approach is particularly relevant as India seeks to serve the next billion users, demonstrating a commitment to shaping how artificial intelligence can enhance everyday life in the country.

For further insights, Microsoft has partnered with Sarvam AI to spur advancements in voice-based GenAI applications, highlighting the growing recognition of Sarvam’s capabilities in the tech ecosystem. In another notable collaboration, Republic and Sarvam AI made headlines by enabling real-time translation of Finance Minister Nirmala Sitharaman’s budget speech, showcasing the practical applications of this innovative technology.

As Sarvam AI continues to carve its niche in the competitive AI sector, its focus on local needs and challenges may serve as a beacon for future developments in technology tailored for specific regions and cultures.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

OpenAI enhances Codex with groundbreaking background operation and in-app browser features to compete with Anthropic's rising Claude Code for enterprise users.

AI Regulation

OpenAI's David Lehane condemns 'doomer' narratives following a Molotov cocktail attack on CEO Sam Altman, urging for responsible AI discourse to prevent societal harm

Top Stories

Anthropic expands its UK operations with an 800-employee office in London and launches the cybersecurity-focused Mythos model for financial institutions.

AI Generative

OpenAI debuts the GPT-5.3 Instant Mini and a $100 Pro plan amid a 300% spike in subscription cancellations and user protests over military ties.

AI Cybersecurity

Anthropic's Claude Mythos Preview can autonomously exploit software vulnerabilities, alarming leaders like U.S. Treasury Secretary Scott Bessent and raising cyber risk concerns.

AI Research

Mark Zuckerberg relocates his desk to Meta's AI lab, actively coding alongside engineers as the company launches Muse Spark, boosting stock prices amid fierce...

Top Stories

Google launches the Gemini app for Mac, its first native macOS AI assistant, enhancing desktop access with customizable shortcuts and screen sharing features.

AI Generative

OpenAI is set to launch GPT-6 this week, featuring significant upgrades like a larger context window and native multimodal capabilities.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.