AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

NVIDIA launches ProRL AGENT, boosting multi-turn LLM training efficiency by nearly doubling performance on benchmarks, while reducing system latency and enhancing scalability.

Staff

Published

2 hours ago

NVIDIA researchers have unveiled ProRL AGENT, a new scalable infrastructure aimed at enhancing reinforcement learning (RL) training for multi-turn large language model (LLM) agents. This innovative system adopts a ‘Rollout-as-a-Service’ approach, which separates agent rollout orchestration from the training loop. By addressing resource conflicts inherent to I/O-intensive environment interactions and GPU-intensive policy updates, the architecture aims to overcome existing bottlenecks in agent development.

Multi-turn agent tasks commonly involve complex interactions with external environments, such as code repositories or operating systems, requiring iterative tool usage. Many current frameworks, including SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM, tightly couple rollout control with the training process. This tight coupling leads to two main issues: conflicting system requirements and maintenance barriers. Rollouts depend heavily on I/O operations, necessitating sandbox creation and asynchronous coordination, while training is centered on GPU-intensive tasks like forward and backward passes, causing inefficiencies when managed concurrently. Additionally, embedding rollout logic within the trainer complicates transitions to different training backends or runtime environments.

The ProRL AGENT operates as a standalone HTTP service, managing the complete rollout lifecycle independently of the RL trainer, which communicates with the server via an API. This design choice enhances flexibility, as the trainer remains agnostic to the underlying rollout infrastructure. To optimize throughput, ProRL AGENT utilizes an asynchronous three-stage assembly line: initialization workers set up sandbox containers, rollout workers manage the multi-turn agent loop, and evaluation workers score results to produce reward signals. This structure allows for overlapping phases across jobs, mitigating slow evaluations from hindering the rollout process.

In terms of infrastructure, ProRL AGENT employs Singularity for its sandboxing solution, which allows for rootless execution crucial for deployment on shared high-performance computing (HPC) clusters managed by Slurm. The system incorporates several optimizations aimed at reducing tool execution latency, which often dominates total rollout times. These include replacing traditional terminal multiplexing with a more efficient approach, connecting directly to persistent IPython kernels to eliminate network overhead, and utilizing Unix Domain Sockets for communication within the execution environment to further reduce latency.

Moreover, ProRL AGENT introduces advanced features designed to improve training stability and hardware utilization. The server manages a pool of LLM inference backends, optimizing prefix cache reuse to minimize inference time across multiple agent turns. To prevent re-tokenization drift, the system maintains token IDs as the canonical representation throughout the entire process, ensuring consistency between rollout and training. It also supports Dynamic Sampling Policy Optimization (DAPO), which filters out non-informative prompts and employs an asynchronous replenishment mechanism to maintain high throughput.

Experimental results validate the effectiveness of ProRL AGENT, demonstrating significant performance improvements across various model scales. For instance, the Qwen3-8B model saw its performance nearly double on the SWE-Bench Verified benchmark, increasing from 9.6% to 18.0%. Similarly, the Qwen3-14B model improved from 15.4% to 23.6%. The system showcased not only advancements in software engineering but also its applicability across STEM, math, and coding domains, with steady reward growth observed during RL training. Scalability tests confirmed that rollout throughput increases nearly linearly as additional compute nodes are introduced.

The introduction of ProRL AGENT signifies a meaningful step in the evolution of reinforcement learning infrastructures, effectively decoupling the rollout lifecycle from policy training. It offers substantial performance gains, reduces system latency, and ensures consistent tokenization, all while facilitating native deployment on HPC clusters. As the demand for more sophisticated AI models grows, innovations like ProRL AGENT could play a pivotal role in optimizing the training processes necessary to develop advanced AI systems.

Dell Launches AI Data Platform with NVIDIA, Boosting Data Processing by 12X for Enterprises

Dell Technologies unveils the Dell AI Data Platform, delivering 12X faster vector indexing to transform enterprise data into actionable insights for AI applications.

Staff4 hours ago

Intel Launches Arc Pro B70 GPU for Local AI at $949, Challenging Nvidia’s Dominance

Intel launches the Arc Pro B70 GPU for local AI workloads at $949, offering 367 TOPS performance to challenge Nvidia's dominance in the market.

Staff19 hours ago

Procore Technologies Integrates with NVIDIA to Revolutionize AI Factory Digital Twins

Procore Technologies partners with NVIDIA to create real-time digital twins, targeting $1.9B revenue by 2029 while enhancing AI-driven construction efficiency.

Staff23 hours ago

AI Technology

Huawei’s 950PR AI Chip Secures Major Orders from ByteDance and Alibaba

Huawei's new 950PR AI chip, priced at $6,900, secures significant orders from ByteDance and Alibaba, signaling a major shift in China's semiconductor landscape.

Staff24 hours ago

AI Business

Meta Platforms Aims for $9 Trillion Valuation by 2031 with New Executive Incentives

Meta Platforms unveils a bold incentive plan aimed at achieving a $9 trillion market cap by 2031, requiring a staggering 494% stock price increase.

Marcus Chen1 day ago

Nvidia Declares AI Inference Inflection Point; Buy Microsoft and Broadcom Stocks for 2026

Nvidia declares AI inference's inflection point as Microsoft boosts throughput by 50% and Broadcom's AI chip revenue doubles to $8.4 billion, signaling strong investment...

Staff1 day ago

AI Technology

CPU Market Surges as Arm Launches In-House Chip, Nvidia Joins the Race for AI Inference

Arm's stock surged 16% after launching its first in-house CPU, while Intel and AMD plan price hikes amid a potential CPU shortage, signaling a...

Staff1 day ago

AI Technology

Nvidia’s Networking Revenue Soars 263% as AI Demand Expands Beyond GPUs

Nvidia's networking revenue skyrocketed 263% year-over-year to $11 billion, highlighting a surge in AI data center demands beyond just GPUs.

Staff2 days ago

AIPRESSA.COM

AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

Top Stories

Dell Launches AI Data Platform with NVIDIA, Boosting Data Processing by 12X for Enterprises

Top Stories

Intel Launches Arc Pro B70 GPU for Local AI at $949, Challenging Nvidia’s Dominance

Top Stories

Procore Technologies Integrates with NVIDIA to Revolutionize AI Factory Digital Twins

AI Technology

Huawei’s 950PR AI Chip Secures Major Orders from ByteDance and Alibaba

AI Business

Meta Platforms Aims for $9 Trillion Valuation by 2031 with New Executive Incentives

Top Stories

Nvidia Declares AI Inference Inflection Point; Buy Microsoft and Broadcom Stocks for 2026

AI Technology

CPU Market Surges as Arm Launches In-House Chip, Nvidia Joins the Race for AI Inference

AI Technology

Nvidia’s Networking Revenue Soars 263% as AI Demand Expands Beyond GPUs