DeepSeek Expands R1 Paper by 64 Pages, Prepares for V4 Release Ahead of Lunar New Year

DeepSeek expands its R1 paper from 22 to 86 pages, unveiling detailed training insights and benchmarks ahead of potential V4 release this Lunar New Year.

Staff

Published

1 day ago

DeepSeek has significantly updated its R1 paper, first released on January 20, 2025, with a newly revised version published on arXiv on January 4, 2026. This update, which saw the document expand from 22 pages to 86, brings a wealth of new technical details without any official announcement or social media promotion. The latest version notably includes a complete breakdown of the training pipeline, expanded evaluation benchmarks, and a comprehensive technical appendix.

The R1 paper originally made headlines by demonstrating that pure reinforcement learning could enable large models to learn reasoning independently, without human-annotated data. It garnered attention not only for its innovative approach but also for its open-source model and method, which sparked interest across the global AI landscape. Following its publication, the paper was peer-reviewed and featured on the cover of Nature on September 17, 2025, marking a significant milestone for DeepSeek as the first mainstream large model to pass peer review in a leading academic journal.

The recent update comes just weeks before the first anniversary of the R1 release and ahead of the Lunar New Year on February 17, a time when DeepSeek has historically made major announcements. Last year, the company unveiled both V3 and R1 during this festive period, leading to speculation about upcoming developments following this latest paper update.

The most notable aspect of the update is the extensive elaboration on the training process, which was previously only briefly outlined. The revised paper introduces three critical checkpoints—Dev1, Dev2, and Dev3—during the model’s training phases. Dev1 improves instruction-following at the expense of reasoning ability, while Dev2 aims to restore reasoning skills. Finally, Dev3 refines performance using advanced techniques, thus addressing concerns about the model’s reasoning capabilities, particularly in long-chain tasks.

Alongside the training details, the evaluation framework has also been significantly expanded. The updated paper now references over 20 benchmarks, including MMLU, GPQA Diamond, and LiveCodeBench, vastly increasing the scope from the original five benchmarks. Notably, the paper introduces a human baseline for comparison, demonstrating that R1’s performance exceeds average human scores in various tasks, a benchmark that provides clearer context than traditional leaderboard rankings.

The appendices added in this version serve as a practical manual for researchers seeking to reproduce R1’s results, detailing everything from hyperparameters to reward function design. This shift from high-level overviews to granular operational guidance marks a clear intention to enhance reproducibility within the research community.

Interestingly, the update also features a candid acknowledgment of unsuccessful techniques pursued by DeepSeek, including attempts with Monte Carlo Tree Search and Process Reward Models, both of which failed to deliver expected outcomes in general reasoning tasks. This transparency is relatively rare in a competitive industry often focused on maintaining proprietary advantages, and it suggests a willingness by DeepSeek to contribute openly to the collective knowledge base of AI research, potentially demystifying some industry challenges.

The timing of the update raises questions about DeepSeek’s strategic direction. By synchronizing the preprint with journal publication details—while also significantly enhancing the content—DeepSeek may be signaling that it has moved past R1’s technologies and is preparing for forthcoming innovations. This aligns with the company’s historical pattern of first publishing papers and then releasing models, suggesting that this update may clear the path for future announcements.

As the AI landscape continues to evolve, the implications of DeepSeek’s updated R1 paper will likely resonate throughout the research community. The commitment to open sourcing technical details and fostering reproducibility underscores a broader trend towards transparency that could influence how future developments are approached in the AI sector. The anticipation surrounding potential announcements in the coming weeks adds to the intrigue of what lies ahead for DeepSeek and the AI community at large.

AI Research

Deep Learning Transforms Protein-Ligand Docking, Boosting Drug Discovery Accuracy by 95%

Researchers demonstrate deep learning's potential in protein-ligand docking, enhancing drug discovery accuracy by 95% and paving the way for personalized therapies.

Staff2 January, 2026

AIPRESSA.COM

Top Stories

DeepSeek Expands R1 Paper by 64 Pages, Prepares for V4 Release Ahead of Lunar New Year

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

Top Stories

DeepSeek V4 Set to Launch Soon, Promises to Outperform Claude and ChatGPT in Coding

Top Stories

DeepSeek V4 Set to Launch February 17, Promises to Outperform Claude and ChatGPT in Coding

Top Stories

DeepSeek R1 Expands from 22 to 86 Pages, Surpassing OpenAI’s Capabilities

Top Stories

AI Market Resilience: DeepSeek’s Year-Long Impact on Tech Stability Unveiled

Top Stories

2025 Sees AI Surge: DeepSeek and Grok Models Redefine Performance Metrics

Top Stories

China’s DeepSeek Unveils AI Framework Amid Strong Policy Support, Eyes Global Dominance by 2027

Top Stories

China’s AI Surge: DeepSeek Models Challenge US Dominance, Targeting 2027 Leadership

AI Research

Deep Learning Transforms Protein-Ligand Docking, Boosting Drug Discovery Accuracy by 95%