Connect with us

Hi, what are you looking for?

AI Generative

Ant Group Unveils 100B Parameter Diffusion Language Model, Surpassing AR Performance

Ant Group, in collaboration with top universities, unveils the 100 billion parameter LLaDA2.0-flash diffusion model, rivaling autoregressive performance in complex tasks.

In a significant advancement for artificial intelligence, the “Diffusion Large Language Model (dLLM)” has evolved from a niche concept at the beginning of the year to a robust model boasting hundreds of billions of parameters. Recent developments have revealed two new models on the HuggingFace platform: LLaDA2.0-mini and LLaDA2.0-flash. These models are the result of a collaborative effort involving Ant Group, Renmin University, Zhejiang University, and Westlake University, both employing the Mixture of Experts (MoE) architecture. LLaDA2.0-mini comprises 16 billion parameters, while LLaDA2.0-flash features a remarkable 100 billion parameters, marking a milestone in the dLLM landscape.

As the scale of these models expands, their capabilities have also significantly improved. In 47 benchmark tests that encompass various dimensions, including knowledge, reasoning, coding, mathematics, and agent alignment, LLaDA2.0-flash achieved an average score of 73.18, closely rivaling the well-known autoregressive model Qwen3-30B-A3B-Instruct-2507, which scored 73.60. The LLaDA2.0-flash model particularly excelled in complex tasks, such as coding challenges like HumanEval and MBPP.

The landscape of large model generation has historically been dominated by the autoregressive approach, which generates text sequentially from start to finish. However, this method has inherent limitations, including high computational costs for long text generation, slow inference speeds, and challenges in capturing bidirectional token dependencies. Errors in previously generated content can lead to cascading issues in subsequent outputs, resulting in cumulative inaccuracies. The successful scaling of dLLMs, particularly through models like LLaDA2.0, demonstrates a promising alternative pathway. Notably, the rapid advancement in this domain does not adhere to a single method of continuous scaling; instead, it emerges from a diversified exploration by researchers.

In September, LLaDA researchers verified the potential of training a dLLM from scratch using the MoE framework, launching the 7B LLaDA-MoE as a novel approach to the diffusion paradigm. Just three months later, the team achieved a breakthrough by smoothly transitioning from an established autoregressive model to a diffusion framework, scaling it to an impressive 100 billion parameters.

LLaDA2.0 has provided a structured solution to the widely recognized challenges associated with scaling dLLMs. Rather than commencing training from scratch, LLaDA2.0 opted for a “smooth” transformation from an existing autoregressive model. This approach involved a systematic solution encompassing the reconstruction of the training paradigm, enhanced collaboration between pre-training and post-training processes, and optimization of both training and inference infrastructure.

Through the process of Continuous Pre-training (CPT), an AR base model was transformed into a Masked Diffusion Language Model (MDLM), enabling bidirectional denoising capabilities. This shift allowed the model to maintain the geometric structure of the original AR representation while learning through larger text segments. Subsequently, Block Diffusion Pre-training was introduced, enhancing long-range generation consistency and computational efficiency. The model’s training culminated in a comprehensive post-training phase that included Supervised Fine-Tuning (SFT), Confidence-Aware Parallel Training (CAP), and Direct Preference Optimization (DPO), which align the model more closely with user instructions and improve inference efficiency.

The engineering optimizations employed during LLaDA2.0’s pre-training, post-training, and inference stages have further addressed issues like training stability and scalability. By leveraging Megatron-LM as the training backend, the team employed multi-parallel strategies, enabling high throughput for models with hundreds of billions of parameters. Innovations in attention implementation have significantly accelerated the training process, achieving end-to-end speed improvements and reducing memory consumption.

As the dLLM field continues to evolve, LLaDA2.0 stands as a transformative example, demonstrating a stable and scalable approach for training diffusion models with extensive parameters. This development not only enhances the capabilities of language models but also positions them to better serve complex tasks in various domains, paving the way for a new era in AI research and applications.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Education

California universities experience a 6% drop in computer science enrollment, reflecting a shift towards AI-focused programs amid rising student interest.

AI Business

Ant Group pivots to AI-driven healthcare, targeting the $69 billion market to tackle China's healthcare challenges amid regulatory shifts and rising chronic diseases.

AI Generative

LLaDA2.1 launches on HuggingFace with a groundbreaking 892 Tokens/second, revolutionizing diffusion language models and challenging autoregressive dominance.

AI Generative

Tencent unveils HunyuanImage 3.0-Instruct, the largest open-source image generation model with 80 billion parameters, enhancing precision editing and multimodal workflows.

AI Education

Renowned AI researcher Haibin Ling joins Westlake University to spearhead the Intelligent Computing Lab, aiming for groundbreaking advancements in artificial intelligence.

AI Research

Researchers at Justus-Liebig-Universität Gießen have doubled chiral dichroism in metasurfaces using a deep learning framework, revolutionizing optical device design.

Top Stories

DeepSeek launches V3.2, achieving gold medal scores at prestigious competitions with 25x lower costs than competitors like GPT-5, revolutionizing AI efficiency.

AI Generative

Ant Group's LingGuang AI assistant achieves over 1 million downloads within five days of launch, showcasing multimodal capabilities that revolutionize user interaction.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.