Connect with us

Hi, what are you looking for?

AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

NVIDIA launches ProRL AGENT, boosting multi-turn LLM training efficiency by nearly doubling performance on benchmarks, while reducing system latency and enhancing scalability.

NVIDIA researchers have unveiled ProRL AGENT, a new scalable infrastructure aimed at enhancing reinforcement learning (RL) training for multi-turn large language model (LLM) agents. This innovative system adopts a ‘Rollout-as-a-Service’ approach, which separates agent rollout orchestration from the training loop. By addressing resource conflicts inherent to I/O-intensive environment interactions and GPU-intensive policy updates, the architecture aims to overcome existing bottlenecks in agent development.

Multi-turn agent tasks commonly involve complex interactions with external environments, such as code repositories or operating systems, requiring iterative tool usage. Many current frameworks, including SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM, tightly couple rollout control with the training process. This tight coupling leads to two main issues: conflicting system requirements and maintenance barriers. Rollouts depend heavily on I/O operations, necessitating sandbox creation and asynchronous coordination, while training is centered on GPU-intensive tasks like forward and backward passes, causing inefficiencies when managed concurrently. Additionally, embedding rollout logic within the trainer complicates transitions to different training backends or runtime environments.

The ProRL AGENT operates as a standalone HTTP service, managing the complete rollout lifecycle independently of the RL trainer, which communicates with the server via an API. This design choice enhances flexibility, as the trainer remains agnostic to the underlying rollout infrastructure. To optimize throughput, ProRL AGENT utilizes an asynchronous three-stage assembly line: initialization workers set up sandbox containers, rollout workers manage the multi-turn agent loop, and evaluation workers score results to produce reward signals. This structure allows for overlapping phases across jobs, mitigating slow evaluations from hindering the rollout process.

In terms of infrastructure, ProRL AGENT employs Singularity for its sandboxing solution, which allows for rootless execution crucial for deployment on shared high-performance computing (HPC) clusters managed by Slurm. The system incorporates several optimizations aimed at reducing tool execution latency, which often dominates total rollout times. These include replacing traditional terminal multiplexing with a more efficient approach, connecting directly to persistent IPython kernels to eliminate network overhead, and utilizing Unix Domain Sockets for communication within the execution environment to further reduce latency.

Moreover, ProRL AGENT introduces advanced features designed to improve training stability and hardware utilization. The server manages a pool of LLM inference backends, optimizing prefix cache reuse to minimize inference time across multiple agent turns. To prevent re-tokenization drift, the system maintains token IDs as the canonical representation throughout the entire process, ensuring consistency between rollout and training. It also supports Dynamic Sampling Policy Optimization (DAPO), which filters out non-informative prompts and employs an asynchronous replenishment mechanism to maintain high throughput.

Experimental results validate the effectiveness of ProRL AGENT, demonstrating significant performance improvements across various model scales. For instance, the Qwen3-8B model saw its performance nearly double on the SWE-Bench Verified benchmark, increasing from 9.6% to 18.0%. Similarly, the Qwen3-14B model improved from 15.4% to 23.6%. The system showcased not only advancements in software engineering but also its applicability across STEM, math, and coding domains, with steady reward growth observed during RL training. Scalability tests confirmed that rollout throughput increases nearly linearly as additional compute nodes are introduced.

The introduction of ProRL AGENT signifies a meaningful step in the evolution of reinforcement learning infrastructures, effectively decoupling the rollout lifecycle from policy training. It offers substantial performance gains, reduces system latency, and ensures consistent tokenization, all while facilitating native deployment on HPC clusters. As the demand for more sophisticated AI models grows, innovations like ProRL AGENT could play a pivotal role in optimizing the training processes necessary to develop advanced AI systems.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Technology

Nvidia secures U.S. approval to export its H200 chip to China, aiming for a potential $28 billion revenue boost amid rising AI demand.

Top Stories

Dell Technologies unveils the Dell AI Data Platform, delivering 12X faster vector indexing to transform enterprise data into actionable insights for AI applications.

Top Stories

Intel launches the Arc Pro B70 GPU for local AI workloads at $949, offering 367 TOPS performance to challenge Nvidia's dominance in the market.

Top Stories

Procore Technologies partners with NVIDIA to create real-time digital twins, targeting $1.9B revenue by 2029 while enhancing AI-driven construction efficiency.

AI Technology

Huawei's new 950PR AI chip, priced at $6,900, secures significant orders from ByteDance and Alibaba, signaling a major shift in China's semiconductor landscape.

AI Business

Meta Platforms unveils a bold incentive plan aimed at achieving a $9 trillion market cap by 2031, requiring a staggering 494% stock price increase.

Top Stories

Nvidia declares AI inference's inflection point as Microsoft boosts throughput by 50% and Broadcom's AI chip revenue doubles to $8.4 billion, signaling strong investment...

AI Technology

Arm's stock surged 16% after launching its first in-house CPU, while Intel and AMD plan price hikes amid a potential CPU shortage, signaling a...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.