Connect with us

Hi, what are you looking for?

AI Generative

Nvidia Launches Nemotron 3 Nano Omni, Open Model with 30B Parameters and Advanced Multimodal Capabilities

Nvidia unveils the Nemotron 3 Nano Omni, a groundbreaking open multimodal model with 30 billion parameters, achieving up to nine times greater throughput than competitors.

Nvidia has introduced the Nemotron 3 Nano Omni, an open multimodal model capable of processing text, images, video, and audio within a single architecture. This release not only emphasizes the model’s performance but also highlights its innovative training data, which incorporates insights gained from previous models such as Qwen, GPT-OSS, Kimi, and DeepSeek-OCR.

The Nemotron 3 Nano Omni is categorized as an open-source multimodal model featuring a substantial 30-billion-parameter design. Utilizing a hybrid architecture that combines Mamba-Transformer with Mixture-of-Experts, this model activates approximately three billion parameters per query. It operates on Nvidia’s proprietary C-RADIOv4-H vision encoder and the Parakeet-TDT audio encoder, boasting a robust context window of up to 256,000 tokens, although it officially supports only the English language.

According to Nvidia’s technical report, the model is specifically designed for agentic applications including document processing, computer-use agents, video and audio analysis, and voice interaction. Performance benchmarks reveal that Nemotron 3 Nano Omni surpasses its predecessor, the Nemotron Nano V2 VL, and competes closely with Alibaba’s Qwen3-Omni. Notably, in tests such as OCRBenchV2, MMLongBench-Doc, WorldSense, and VoiceBench, the new model demonstrates significant improvements, particularly in the OSWorld benchmark for GUI agents, where accuracy rose from 11.1 to 47.4 points compared to the earlier version. Nvidia claims that the throughput at equivalent interactivity levels can be up to nine times greater than that of the Qwen3-Omni.

While the performance metrics are notable, the method of training data collection offers further insight into the model’s development. Nvidia has processed approximately 717 billion tokens through seven distinct training stages, progressively expanding the context window. A substantial portion of this synthetic training data has been derived from competing models, generating image captions, question-answer pairs, and reasoning traces through the use of models such as Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, gpt-oss-120b, and others. Nvidia also incorporated filtering from GPT-4o and Gemini 3 Flash Preview.

Utilizing other models for training is a common practice within the AI industry, although few companies are as transparent about these methods. Prominent firms like OpenAI, Anthropic, and Google have previously accused Chinese AI laboratories of conducting large-scale distillation practices. The audio data employed in Nemotron 3 Nano Omni includes Nvidia’s own Granary and SIFT-50M datasets, supplemented by captions from Qwen’s Omni-Captioner. During the reinforcement learning phase, Nvidia employed a five-stage pipeline across 25 distinct environments, covering tasks such as visual grounding, chart and document understanding, GUI interaction, and automatic speech recognition.

Nvidia’s release includes not only the model weights in BF16, FP8, and NVFP4 but also portions of the training data, the training pipelines on Megatron-Bridge, and reinforcement learning recipes available on NeMo-RL. This comprehensive approach distinguishes this release from others that typically only offer model weights. Furthermore, the model’s reasoning mode is enabled by default, requiring users to disable it manually for tasks that do not necessitate a chain-of-thought process. It is distributed under the NVIDIA Open Model Agreement, permitting commercial usage.

The unveiling of the Nemotron 3 Nano Omni highlights Nvidia’s commitment to advancing multimodal AI technologies, setting a new benchmark in performance and transparency for the industry. As competition intensifies among AI developers, developments such as this will continue to shape the landscape of artificial intelligence applications across diverse sectors.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

LG Electronics and Nvidia are in talks to innovate AI robotics and data centers, aiming to enhance competitive edge in advanced tech sectors.

AI Technology

Cerebras targets a $35 billion IPO ahead of OpenAI, fueled by a $20 billion partnership and innovative wafer-scale chips promising 15x faster AI inference.

AI Technology

Cerebras Systems files for IPO amid surging investor interest in AI chips designed for large-scale models, positioning itself against giants like Nvidia.

AI Technology

Nasdaq rebounds 4.30% as Nvidia leads AI stock resurgence, trading at $208.24, signaling renewed investor confidence in tech growth opportunities.

AI Finance

Nvidia's market cap soars to $5 trillion, fueling a record high in the S&P 500 as AI infrastructure investments surge, with $650 billion expected...

AI Technology

DeepSeek unveils its 1.6 trillion parameter V4 model optimized for Huawei chips, priced at $3.48 per million tokens, amid U.S. IP theft allegations.

AI Finance

Nvidia, Broadcom, and Amazon are set to drive the Nasdaq to new highs, with Nvidia projecting staggering revenue growth of 79% in Q1 and...

AI Technology

Nvidia projects a remarkable 124% revenue growth by 2027, while Broadcom aims for $100 billion in AI revenue, positioning both as top investment choices.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.