DeepSeek Expands R1 Paper by 64 Pages, Prepares for V4 Release Ahead of Lunar New Year

DeepSeek expands its R1 paper from 22 to 86 pages, unveiling detailed training insights and benchmarks ahead of potential V4 release this Lunar New Year.

Staff

Published

9 January, 2026

DeepSeek has significantly updated its R1 paper, first released on January 20, 2025, with a newly revised version published on arXiv on January 4, 2026. This update, which saw the document expand from 22 pages to 86, brings a wealth of new technical details without any official announcement or social media promotion. The latest version notably includes a complete breakdown of the training pipeline, expanded evaluation benchmarks, and a comprehensive technical appendix.

The R1 paper originally made headlines by demonstrating that pure reinforcement learning could enable large models to learn reasoning independently, without human-annotated data. It garnered attention not only for its innovative approach but also for its open-source model and method, which sparked interest across the global AI landscape. Following its publication, the paper was peer-reviewed and featured on the cover of Nature on September 17, 2025, marking a significant milestone for DeepSeek as the first mainstream large model to pass peer review in a leading academic journal.

The recent update comes just weeks before the first anniversary of the R1 release and ahead of the Lunar New Year on February 17, a time when DeepSeek has historically made major announcements. Last year, the company unveiled both V3 and R1 during this festive period, leading to speculation about upcoming developments following this latest paper update.

The most notable aspect of the update is the extensive elaboration on the training process, which was previously only briefly outlined. The revised paper introduces three critical checkpoints—Dev1, Dev2, and Dev3—during the model’s training phases. Dev1 improves instruction-following at the expense of reasoning ability, while Dev2 aims to restore reasoning skills. Finally, Dev3 refines performance using advanced techniques, thus addressing concerns about the model’s reasoning capabilities, particularly in long-chain tasks.

Alongside the training details, the evaluation framework has also been significantly expanded. The updated paper now references over 20 benchmarks, including MMLU, GPQA Diamond, and LiveCodeBench, vastly increasing the scope from the original five benchmarks. Notably, the paper introduces a human baseline for comparison, demonstrating that R1’s performance exceeds average human scores in various tasks, a benchmark that provides clearer context than traditional leaderboard rankings.

The appendices added in this version serve as a practical manual for researchers seeking to reproduce R1’s results, detailing everything from hyperparameters to reward function design. This shift from high-level overviews to granular operational guidance marks a clear intention to enhance reproducibility within the research community.

Interestingly, the update also features a candid acknowledgment of unsuccessful techniques pursued by DeepSeek, including attempts with Monte Carlo Tree Search and Process Reward Models, both of which failed to deliver expected outcomes in general reasoning tasks. This transparency is relatively rare in a competitive industry often focused on maintaining proprietary advantages, and it suggests a willingness by DeepSeek to contribute openly to the collective knowledge base of AI research, potentially demystifying some industry challenges.

The timing of the update raises questions about DeepSeek’s strategic direction. By synchronizing the preprint with journal publication details—while also significantly enhancing the content—DeepSeek may be signaling that it has moved past R1’s technologies and is preparing for forthcoming innovations. This aligns with the company’s historical pattern of first publishing papers and then releasing models, suggesting that this update may clear the path for future announcements.

As the AI landscape continues to evolve, the implications of DeepSeek’s updated R1 paper will likely resonate throughout the research community. The commitment to open sourcing technical details and fostering reproducibility underscores a broader trend towards transparency that could influence how future developments are approached in the AI sector. The anticipation surrounding potential announcements in the coming weeks adds to the intrigue of what lies ahead for DeepSeek and the AI community at large.

OpenAI, Google, Anthropic Unite to Combat AI Model Theft Amid Rising Distillation Attacks

OpenAI, Anthropic, and Google unite to combat distillation attacks from Chinese startups, launching the Frontier Model Forum to protect valuable AI innovations.

Staff13 hours ago

AI Research

DeepSeek Set to Launch V4 AI Model, Potentially Using Huawei Chips to Counter US Restrictions

DeepSeek's upcoming V4 AI model, potentially powered by Huawei chips, aims to redefine AI capabilities amid US export restrictions, signaling China's technological ascent.

Staff1 day ago

AI Technology

DeepSeek Delays V4 AI Model Launch Amid Speculation on China’s Tech Independence

DeepSeek delays the V4 AI model launch amid speculation over its reliance on Huawei chips, raising stakes for China's tech independence amid U.S. restrictions.

Staff2 days ago

DeepSeek Launches Instant and Expert Chatbot Modes Ahead of V4 Release

DeepSeek unveils dual chatbot modes—instant and expert—enhancing user experience ahead of its flagship V4 launch, boosting interaction efficiency for diverse needs.

Staff2 days ago

AIPRESSA.COM

Top Stories

DeepSeek Expands R1 Paper by 64 Pages, Prepares for V4 Release Ahead of Lunar New Year

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

You May Also Like

Top Stories

OpenAI, Google, Anthropic Unite to Combat AI Model Theft Amid Rising Distillation Attacks

AI Research

DeepSeek Set to Launch V4 AI Model, Potentially Using Huawei Chips to Counter US Restrictions

AI Technology

DeepSeek Delays V4 AI Model Launch Amid Speculation on China’s Tech Independence

Top Stories

DeepSeek Launches Instant and Expert Chatbot Modes Ahead of V4 Release

AI Technology

BTQ Technologies Reveals Quantum Bitcoin Mining Costs: 10^23 Qubits Required by 2025

Top Stories

Google Launches Open-Source Gemma 4 with Apache 2.0 License After Developer Exodus

Top Stories

DeepSeek Unveils V4 AI Model Powered by Huawei’s Latest Chips Amidst Industry Buzz

Top Stories

DeepSeek AI Predicts Nvidia’s Stock Will Surge to $265 by End of 2026