AI Generative

Beijing Academy of Artificial Intelligence Publishes Emu3 Model in Nature, Advancing Multimodal Learning

Beijing Academy of Artificial Intelligence’s Emu3 model, published in Nature, achieves state-of-the-art multimodal learning, surpassing competitors in performance benchmarks.

Staff

Published

1 hour ago

In a significant breakthrough for artificial intelligence, the Beijing Academy of Artificial Intelligence (BAAI) has launched the multimodal large model “Wujie·Emu“, which has been published in the main issue of Nature on January 29, 2024. This makes BAAI the second Chinese research team, following DeepSeek, to have a large model published in this prestigious journal, marking a milestone as China’s first Nature paper in the multimodal large model domain.

The editors of Nature highlighted Emu’s capabilities, stating it “achieves unified learning of large-scale text, images, and videos based solely on ‘predicting the next token.'” The model demonstrates performance in generation and perception tasks that rivals specialized systems. This advancement is poised to significantly influence the development of native multimodal assistants and embodied intelligence.

Launched in October 2024, the Emu model has shown impressive versatility, excelling in text-to-image and text-to-video generation, future prediction, and visual-language understanding. Emu’s autoregressive approach is notable for its simplicity, fostering a unified pathway for generative AI. Its performance metrics reveal it surpasses existing models on various benchmarks; for instance, it outperformed diffusion models such as SDXL in image generation and achieved a score of 81 on the VBench for video generation, edging out models like Open-Sora 1.2.

Jack Clark, co-founder of Anthropic and former head of policy at OpenAI, remarked on Emu’s architectural route, emphasizing its simplicity and scalability potential. BAAI’s president, Wang Zhongyuan, echoed this sentiment, stating, “The simpler the architecture, the greater the potential productivity and the greater the value to the industry.” The streamlined approach not only reduces complexity in the research and development process but also enhances efficiency in model construction and maintenance.

By October 2025, the Emu series evolved into a multimodal world model, Emu3.5, which can simulate exploration and operations in virtual environments. This iteration not only achieved state-of-the-art performance in multimodality but also introduced the concept of a “multimodal Scaling paradigm,” allowing the model to learn the inherent laws of world evolution. This innovation holds promise for the future development of physical AI fields, particularly embodied intelligence.

The journey to publish Emu in Nature involved overcoming significant challenges. Originally initiated in February 2024, the project aimed to explore whether the autoregressive technology route could unify multimodality, a question that had remained elusive within the field. The team, comprising 50 researchers, undertook the ambitious task of developing a high-performance, native multimodal large model focused on autoregressive architecture. They pioneered the unified discretization of images, text, and videos into the same representation space, jointly training a single Transformer on multimodal sequence data.

Despite the team’s efforts, the path to success was fraught with technological risks, particularly in adopting a “discrete token” approach that aimed to reinvent a language system for visual modalities. The challenges included effectively compressing images based on tokens, which often led to setbacks. Additionally, amid a competitive landscape, many teams faltered, but BAAI remained resolute in its pursuit of a unified multimodal model. Team members believed that achieving a model capable of understanding the physical world was essential for advancing toward artificial general intelligence (AGI).

Since its release, Emu3 has markedly impacted the multimodal landscape, gaining recognition for its performance across various tasks. In the realm of text-to-image generation, it has shown capabilities comparable to leading diffusion models and established its standing in visual-language understanding without relying on specialized pre-trained large language models. Furthermore, Emu3’s video generation abilities allow it to create competitive five-second videos at 24 frames per second, surpassing established models within that timeframe.

As industries increasingly recognize Emu3’s contributions, its success reinforces the potential for autoregressive technology to serve as a foundation for unified multimodal learning. This development not only influences the direction of AI research but also lays the groundwork for future advancements in intelligent systems that can seamlessly integrate diverse modalities.

AI Research

China’s 15th Five-Year Plan Accelerates AI Integration in Daily Life with $172.6B Industry Growth

China's 15th Five-Year Plan empowers AI innovation with a projected industry growth to $172.6B by 2025, spurring integration into daily life.

Staff2 hours ago

Nvidia Aids DeepSeek in Developing AI Models for Chinese Military, Lawmaker Reveals

Nvidia's extensive technical support to China's DeepSeek in refining military AI models raises urgent national security concerns, warns U.S. lawmaker John Moolenaar.

Staff6 hours ago

China Grants DeepSeek Conditional Access to Nvidia’s H200 AI Chips Amid Regulatory Shift

China grants AI startup DeepSeek conditional access to Nvidia's H200 chips, marking a pivotal shift in U.S. tech export policies amid growing demand.

Staff17 hours ago

AI Regulation

India’s AI Strategy: Prioritizing Practical Innovation Over Western Hype Cycle

India's adaptable AI strategy prioritizes practical innovation over costly Western models, aiming to cultivate local talent and domain-specific applications while navigating global market volatility.

Staff1 day ago

AI Technology

China Approves Nvidia GPU Purchase, Empowering AI Firms with 400,000 Units

China grants approval for Alibaba, Tencent, and DeepSeek to purchase 400,000 Nvidia H200 GPUs, enhancing AI capabilities and reducing sector risks.

Staff1 day ago

AI Cybersecurity

AI Experts Warn: 95% of MCP Deployments Lack Security, Igniting Cyber Threat Concerns

95% of Model Context Protocol deployments lack security, raising alarms among experts as AI cyber threats escalate, particularly from nation-states like Iran and China.

Rachel Torres1 day ago

AI Technology

Capstone Reveals 2026 Tech Outlook: AI Regulation and Digital Asset Competition Intensify

Capstone forecasts intensified regulatory challenges for Google, Meta, and Apple in 2026, as new AI laws and cryptocurrency frameworks reshape the tech landscape.

Staff2 days ago

U.S. Fragmented AI Laws Threaten Startups, Risking Innovation Amid China’s Unified Approach

U.S. startups face rising operational costs as fragmented AI laws across states threaten innovation, while China’s unified regulations bolster its competitive edge.

Staff2 days ago

AIPRESSA.COM

AI Generative

Beijing Academy of Artificial Intelligence Publishes Emu3 Model in Nature, Advancing Multimodal Learning

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Research

China’s 15th Five-Year Plan Accelerates AI Integration in Daily Life with $172.6B Industry Growth

Top Stories

Nvidia Aids DeepSeek in Developing AI Models for Chinese Military, Lawmaker Reveals

Top Stories

China Grants DeepSeek Conditional Access to Nvidia’s H200 AI Chips Amid Regulatory Shift

AI Regulation

India’s AI Strategy: Prioritizing Practical Innovation Over Western Hype Cycle

AI Technology

China Approves Nvidia GPU Purchase, Empowering AI Firms with 400,000 Units

AI Cybersecurity

AI Experts Warn: 95% of MCP Deployments Lack Security, Igniting Cyber Threat Concerns

AI Technology

Capstone Reveals 2026 Tech Outlook: AI Regulation and Digital Asset Competition Intensify

Top Stories

U.S. Fragmented AI Laws Threaten Startups, Risking Innovation Amid China’s Unified Approach