Connect with us

Hi, what are you looking for?

Top Stories

DeepSeek Expands R1 Paper by 64 Pages, Prepares for V4 Release Ahead of Lunar New Year

DeepSeek expands its R1 paper from 22 to 86 pages, unveiling detailed training insights and benchmarks ahead of potential V4 release this Lunar New Year.

DeepSeek has significantly updated its R1 paper, first released on January 20, 2025, with a newly revised version published on arXiv on January 4, 2026. This update, which saw the document expand from 22 pages to 86, brings a wealth of new technical details without any official announcement or social media promotion. The latest version notably includes a complete breakdown of the training pipeline, expanded evaluation benchmarks, and a comprehensive technical appendix.

The R1 paper originally made headlines by demonstrating that pure reinforcement learning could enable large models to learn reasoning independently, without human-annotated data. It garnered attention not only for its innovative approach but also for its open-source model and method, which sparked interest across the global AI landscape. Following its publication, the paper was peer-reviewed and featured on the cover of Nature on September 17, 2025, marking a significant milestone for DeepSeek as the first mainstream large model to pass peer review in a leading academic journal.

The recent update comes just weeks before the first anniversary of the R1 release and ahead of the Lunar New Year on February 17, a time when DeepSeek has historically made major announcements. Last year, the company unveiled both V3 and R1 during this festive period, leading to speculation about upcoming developments following this latest paper update.

The most notable aspect of the update is the extensive elaboration on the training process, which was previously only briefly outlined. The revised paper introduces three critical checkpoints—Dev1, Dev2, and Dev3—during the model’s training phases. Dev1 improves instruction-following at the expense of reasoning ability, while Dev2 aims to restore reasoning skills. Finally, Dev3 refines performance using advanced techniques, thus addressing concerns about the model’s reasoning capabilities, particularly in long-chain tasks.

Alongside the training details, the evaluation framework has also been significantly expanded. The updated paper now references over 20 benchmarks, including MMLU, GPQA Diamond, and LiveCodeBench, vastly increasing the scope from the original five benchmarks. Notably, the paper introduces a human baseline for comparison, demonstrating that R1’s performance exceeds average human scores in various tasks, a benchmark that provides clearer context than traditional leaderboard rankings.

The appendices added in this version serve as a practical manual for researchers seeking to reproduce R1’s results, detailing everything from hyperparameters to reward function design. This shift from high-level overviews to granular operational guidance marks a clear intention to enhance reproducibility within the research community.

Interestingly, the update also features a candid acknowledgment of unsuccessful techniques pursued by DeepSeek, including attempts with Monte Carlo Tree Search and Process Reward Models, both of which failed to deliver expected outcomes in general reasoning tasks. This transparency is relatively rare in a competitive industry often focused on maintaining proprietary advantages, and it suggests a willingness by DeepSeek to contribute openly to the collective knowledge base of AI research, potentially demystifying some industry challenges.

The timing of the update raises questions about DeepSeek’s strategic direction. By synchronizing the preprint with journal publication details—while also significantly enhancing the content—DeepSeek may be signaling that it has moved past R1’s technologies and is preparing for forthcoming innovations. This aligns with the company’s historical pattern of first publishing papers and then releasing models, suggesting that this update may clear the path for future announcements.

As the AI landscape continues to evolve, the implications of DeepSeek’s updated R1 paper will likely resonate throughout the research community. The commitment to open sourcing technical details and fostering reproducibility underscores a broader trend towards transparency that could influence how future developments are approached in the AI sector. The anticipation surrounding potential announcements in the coming weeks adds to the intrigue of what lies ahead for DeepSeek and the AI community at large.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Research

Mass General Brigham unveils APOLLO, a transformative AI model trained on 25 billion medical events, achieving a 0.92 AUROC for predicting schizophrenia risks.

Top Stories

DeepSeek's V4 open-source model undercuts GPT-5.5 and Claude Opus 4.7 with costs of $1.74 per million tokens, promising a disruptive shift in AI pricing...

AI Technology

US lawmakers initiate a probe into PRC-developed AI systems, citing national security risks and potential exploitation of American innovations by companies like DeepSeek and...

AI Generative

DeepSeek unveils V4 AI model with advanced reasoning and agentic capabilities, outperforming OpenAI's GPT-5.2 while integrating Huawei chips for enhanced autonomy.

Top Stories

Anuma launches a privacy-first AI platform allowing users access to 10 leading models with a unique encrypted memory, enhancing data control and context retention.

Top Stories

DeepSeek's V4-Pro eclipses GPT-5 and Claude in key benchmarks, achieving a Codeforces rating of 3,206 while undercutting OpenAI's costs by 89% per million tokens.

AI Technology

DeepSeek unveils its 1.6 trillion parameter V4 model optimized for Huawei chips, priced at $3.48 per million tokens, amid U.S. IP theft allegations.

Top Stories

OpenAI slashes token prices to $5, pressuring Anthropic’s premium Claude Opus model as competition intensifies in the AI market.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.