New Delhi: India’s technology landscape is making significant strides with the introduction of Sarvam AI, a startup based in Bengaluru. This company has launched artificial intelligence models tailored specifically for Indian users, named Sarvam Audio and Sarvam Vision. These models are not only positioned to compete with global giants like Google and OpenAI but are also demonstrating superior performance in contexts relevant to India.
In a country where voice communication is paramount, Sarvam AI has developed systems that prioritize speech over traditional text-based interaction. Many Indians, including farmers and delivery workers, rely on verbal instructions in their daily activities. In response, Sarvam Audio has been meticulously trained on 22 Indian languages from the ground up, enabling it to effectively understand “code-mixing,” where speakers fluidly switch between languages, a common feature in Indian vernacular.
The performance metrics for Sarvam’s products reveal a compelling narrative. Sarvam Audio has consistently surpassed competitors on the IndicVoices benchmark, outperforming Google’s Gemini-3-Flash and OpenAI’s GPT-4o in transcription accuracy with a notably lower Word Error Rate (WER). Similarly, the visual model, Sarvam Vision, achieved an impressive 84.3 percent accuracy on the olmOCR-Bench, outperforming both Gemini 3 Pro and DeepSeek. In the realm of document analysis, Sarvam Vision scored 93.28 percent on the OmniDoc benchmark, demonstrating that smaller, specialized models can exceed the capabilities of larger global systems when addressing Indian-specific documents, tables, and formulas.
Among its innovative features, Sarvam Audio introduces a unique Speech-to-Command capability. Unlike traditional models that require speech to be transcribed into text before any action is taken, Sarvam Audio can initiate actions directly from voice commands. This eliminates latency and minimizes misunderstandings, particularly in noisy environments. For instance, when a user says “Nau” in Hindi, Sarvam Audio accurately interprets it as the numeral “9,” while other systems may misinterpret it as the English word “No.”
Additionally, Sarvam Audio integrates advanced speaker diarization technology, allowing it to distinguish between up to eight different voices in a single audio recording. This feature is particularly beneficial in the Indian context, where overlapping voices are commonplace in busy offices and call centers. The model is also optimized for 8kHz telephony, ensuring reliable performance even with the low-quality audio often found in traditional customer service calls.
Sarvam AI’s growth is backed by the IndiaAI Mission and government-supported GPU clusters, emphasizing a commitment to developing sovereign AI technology within India. By crafting models that cater exclusively to Indian users, Sarvam AI aims to reduce reliance on foreign systems, aligning with a broader vision of ensuring that India retains control over its digital landscape.
By launching Sarvam Audio and Sarvam Vision, Sarvam AI is not merely presenting itself as an alternative to major tech companies; rather, it is positioning itself as a leader in AI innovation that takes into account the unique needs of Indian users. This approach is particularly relevant as India seeks to serve the next billion users, demonstrating a commitment to shaping how artificial intelligence can enhance everyday life in the country.
For further insights, Microsoft has partnered with Sarvam AI to spur advancements in voice-based GenAI applications, highlighting the growing recognition of Sarvam’s capabilities in the tech ecosystem. In another notable collaboration, Republic and Sarvam AI made headlines by enabling real-time translation of Finance Minister Nirmala Sitharaman’s budget speech, showcasing the practical applications of this innovative technology.
As Sarvam AI continues to carve its niche in the competitive AI sector, its focus on local needs and challenges may serve as a beacon for future developments in technology tailored for specific regions and cultures.
See also
Xcode 26.3 Launches Enhanced Agentic Coding with Claude Agent and OpenAI Codex Integration
Big Tech Loses $1T Amid $660B AI Spending Surge, Investors Demand Clarity on Returns
Germany”s National Team Prepares for World Cup Qualifiers with Disco Atmosphere
95% of AI Projects Fail in Companies According to MIT
AI in Food & Beverages Market to Surge from $11.08B to $263.80B by 2032



















































