AI Generative

Ant Group Unveils 100B Parameter Diffusion Language Model, Surpassing AR Performance

Ant Group, in collaboration with top universities, unveils the 100 billion parameter LLaDA2.0-flash diffusion model, rivaling autoregressive performance in complex tasks.

Staff

Published

13 December, 2025

In a significant advancement for artificial intelligence, the “Diffusion Large Language Model (dLLM)” has evolved from a niche concept at the beginning of the year to a robust model boasting hundreds of billions of parameters. Recent developments have revealed two new models on the HuggingFace platform: LLaDA2.0-mini and LLaDA2.0-flash. These models are the result of a collaborative effort involving Ant Group, Renmin University, Zhejiang University, and Westlake University, both employing the Mixture of Experts (MoE) architecture. LLaDA2.0-mini comprises 16 billion parameters, while LLaDA2.0-flash features a remarkable 100 billion parameters, marking a milestone in the dLLM landscape.

As the scale of these models expands, their capabilities have also significantly improved. In 47 benchmark tests that encompass various dimensions, including knowledge, reasoning, coding, mathematics, and agent alignment, LLaDA2.0-flash achieved an average score of 73.18, closely rivaling the well-known autoregressive model Qwen3-30B-A3B-Instruct-2507, which scored 73.60. The LLaDA2.0-flash model particularly excelled in complex tasks, such as coding challenges like HumanEval and MBPP.

The landscape of large model generation has historically been dominated by the autoregressive approach, which generates text sequentially from start to finish. However, this method has inherent limitations, including high computational costs for long text generation, slow inference speeds, and challenges in capturing bidirectional token dependencies. Errors in previously generated content can lead to cascading issues in subsequent outputs, resulting in cumulative inaccuracies. The successful scaling of dLLMs, particularly through models like LLaDA2.0, demonstrates a promising alternative pathway. Notably, the rapid advancement in this domain does not adhere to a single method of continuous scaling; instead, it emerges from a diversified exploration by researchers.

In September, LLaDA researchers verified the potential of training a dLLM from scratch using the MoE framework, launching the 7B LLaDA-MoE as a novel approach to the diffusion paradigm. Just three months later, the team achieved a breakthrough by smoothly transitioning from an established autoregressive model to a diffusion framework, scaling it to an impressive 100 billion parameters.

LLaDA2.0 has provided a structured solution to the widely recognized challenges associated with scaling dLLMs. Rather than commencing training from scratch, LLaDA2.0 opted for a “smooth” transformation from an existing autoregressive model. This approach involved a systematic solution encompassing the reconstruction of the training paradigm, enhanced collaboration between pre-training and post-training processes, and optimization of both training and inference infrastructure.

Through the process of Continuous Pre-training (CPT), an AR base model was transformed into a Masked Diffusion Language Model (MDLM), enabling bidirectional denoising capabilities. This shift allowed the model to maintain the geometric structure of the original AR representation while learning through larger text segments. Subsequently, Block Diffusion Pre-training was introduced, enhancing long-range generation consistency and computational efficiency. The model’s training culminated in a comprehensive post-training phase that included Supervised Fine-Tuning (SFT), Confidence-Aware Parallel Training (CAP), and Direct Preference Optimization (DPO), which align the model more closely with user instructions and improve inference efficiency.

The engineering optimizations employed during LLaDA2.0’s pre-training, post-training, and inference stages have further addressed issues like training stability and scalability. By leveraging Megatron-LM as the training backend, the team employed multi-parallel strategies, enabling high throughput for models with hundreds of billions of parameters. Innovations in attention implementation have significantly accelerated the training process, achieving end-to-end speed improvements and reducing memory consumption.

As the dLLM field continues to evolve, LLaDA2.0 stands as a transformative example, demonstrating a stable and scalable approach for training diffusion models with extensive parameters. This development not only enhances the capabilities of language models but also positions them to better serve complex tasks in various domains, paving the way for a new era in AI research and applications.

AI Education

California Universities See 6% Drop in CS Enrollment Amid AI Program Surge

California universities experience a 6% drop in computer science enrollment, reflecting a shift towards AI-focused programs amid rising student interest.

David Park2 days ago

AI Business

Ant Group Bets on AI-Driven Healthcare with $69 Billion Market Potential

Ant Group pivots to AI-driven healthcare, targeting the $69 billion market to tackle China's healthcare challenges amid regulatory shifts and rising chronic diseases.

Marcus Chen5 days ago

AI Generative

LLaDA2.1 Launches with 892 Tokens/Second, Revolutionizing Diffusion Language Models

LLaDA2.1 launches on HuggingFace with a groundbreaking 892 Tokens/second, revolutionizing diffusion language models and challenging autoregressive dominance.

Staff5 days ago

AI Generative

Tencent Launches HunyuanImage 3.0-Instruct, the Largest Open-Source Image Editing AI Model

Tencent unveils HunyuanImage 3.0-Instruct, the largest open-source image generation model with 80 billion parameters, enhancing precision editing and multimodal workflows.

Staff3 February, 2026

AI Education

Top AI Researcher Haibin Ling Joins Westlake University to Lead Intelligent Computing Lab

Renowned AI researcher Haibin Ling joins Westlake University to spearhead the Intelligent Computing Lab, aiming for groundbreaking advancements in artificial intelligence.

David Park10 January, 2026

AI Research

Deep Learning Boosts Chiral Metasurfaces, Doubling Dichroism for Advanced Optical Devices

Researchers at Justus-Liebig-Universität Gießen have doubled chiral dichroism in metasurfaces using a deep learning framework, revolutionizing optical device design.

Staff19 December, 2025

DeepSeek Launches V3.2, Achieves Gold Medal Performance with 25x Lower Costs than GPT-5

DeepSeek launches V3.2, achieving gold medal scores at prestigious competitions with 25x lower costs than competitors like GPT-5, revolutionizing AI efficiency.

Staff5 December, 2025

AI Generative

Ant Group Launches LingGuang AI Assistant, Surpassing 1M Downloads in Days

Ant Group's LingGuang AI assistant achieves over 1 million downloads within five days of launch, showcasing multimodal capabilities that revolutionize user interaction.

Staff26 November, 2025

AIPRESSA.COM

AI Generative

Ant Group Unveils 100B Parameter Diffusion Language Model, Surpassing AR Performance

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Education

California Universities See 6% Drop in CS Enrollment Amid AI Program Surge

AI Business

Ant Group Bets on AI-Driven Healthcare with $69 Billion Market Potential

AI Generative

LLaDA2.1 Launches with 892 Tokens/Second, Revolutionizing Diffusion Language Models

AI Generative

Tencent Launches HunyuanImage 3.0-Instruct, the Largest Open-Source Image Editing AI Model

AI Education

Top AI Researcher Haibin Ling Joins Westlake University to Lead Intelligent Computing Lab

AI Research

Deep Learning Boosts Chiral Metasurfaces, Doubling Dichroism for Advanced Optical Devices

Top Stories

DeepSeek Launches V3.2, Achieves Gold Medal Performance with 25x Lower Costs than GPT-5

AI Generative

Ant Group Launches LingGuang AI Assistant, Surpassing 1M Downloads in Days