Connect with us

Hi, what are you looking for?

AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

NVIDIA launches ProRL AGENT, boosting multi-turn LLM training efficiency by nearly doubling performance on benchmarks, while reducing system latency and enhancing scalability.

NVIDIA researchers have unveiled ProRL AGENT, a new scalable infrastructure aimed at enhancing reinforcement learning (RL) training for multi-turn large language model (LLM) agents. This innovative system adopts a ‘Rollout-as-a-Service’ approach, which separates agent rollout orchestration from the training loop. By addressing resource conflicts inherent to I/O-intensive environment interactions and GPU-intensive policy updates, the architecture aims to overcome existing bottlenecks in agent development.

Multi-turn agent tasks commonly involve complex interactions with external environments, such as code repositories or operating systems, requiring iterative tool usage. Many current frameworks, including SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM, tightly couple rollout control with the training process. This tight coupling leads to two main issues: conflicting system requirements and maintenance barriers. Rollouts depend heavily on I/O operations, necessitating sandbox creation and asynchronous coordination, while training is centered on GPU-intensive tasks like forward and backward passes, causing inefficiencies when managed concurrently. Additionally, embedding rollout logic within the trainer complicates transitions to different training backends or runtime environments.

The ProRL AGENT operates as a standalone HTTP service, managing the complete rollout lifecycle independently of the RL trainer, which communicates with the server via an API. This design choice enhances flexibility, as the trainer remains agnostic to the underlying rollout infrastructure. To optimize throughput, ProRL AGENT utilizes an asynchronous three-stage assembly line: initialization workers set up sandbox containers, rollout workers manage the multi-turn agent loop, and evaluation workers score results to produce reward signals. This structure allows for overlapping phases across jobs, mitigating slow evaluations from hindering the rollout process.

In terms of infrastructure, ProRL AGENT employs Singularity for its sandboxing solution, which allows for rootless execution crucial for deployment on shared high-performance computing (HPC) clusters managed by Slurm. The system incorporates several optimizations aimed at reducing tool execution latency, which often dominates total rollout times. These include replacing traditional terminal multiplexing with a more efficient approach, connecting directly to persistent IPython kernels to eliminate network overhead, and utilizing Unix Domain Sockets for communication within the execution environment to further reduce latency.

Moreover, ProRL AGENT introduces advanced features designed to improve training stability and hardware utilization. The server manages a pool of LLM inference backends, optimizing prefix cache reuse to minimize inference time across multiple agent turns. To prevent re-tokenization drift, the system maintains token IDs as the canonical representation throughout the entire process, ensuring consistency between rollout and training. It also supports Dynamic Sampling Policy Optimization (DAPO), which filters out non-informative prompts and employs an asynchronous replenishment mechanism to maintain high throughput.

Experimental results validate the effectiveness of ProRL AGENT, demonstrating significant performance improvements across various model scales. For instance, the Qwen3-8B model saw its performance nearly double on the SWE-Bench Verified benchmark, increasing from 9.6% to 18.0%. Similarly, the Qwen3-14B model improved from 15.4% to 23.6%. The system showcased not only advancements in software engineering but also its applicability across STEM, math, and coding domains, with steady reward growth observed during RL training. Scalability tests confirmed that rollout throughput increases nearly linearly as additional compute nodes are introduced.

The introduction of ProRL AGENT signifies a meaningful step in the evolution of reinforcement learning infrastructures, effectively decoupling the rollout lifecycle from policy training. It offers substantial performance gains, reduces system latency, and ensures consistent tokenization, all while facilitating native deployment on HPC clusters. As the demand for more sophisticated AI models grows, innovations like ProRL AGENT could play a pivotal role in optimizing the training processes necessary to develop advanced AI systems.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Government

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

AI Technology

AMD unveils the Ryzen AI Halo Mini-PC, boasting a 16-core Ryzen AI Max+ 395 APU and the capability to process models with up to...

AI Generative

Nvidia's partnerships with Asian firms like LG and Nanya surge AI chip demand to 90% of production costs, reshaping the tech landscape in Asia.

AI Business

Nvidia CEO Jensen Huang urges industry leaders to avoid alarmist claims about AI's future, citing concerns over inaccurate predictions like a 50% job displacement...

AI Technology

Apple CEO Tim Cook warns of several-month supply shortages for the Mac mini and Mac Studio as demand surges, pushing Mac revenue to $8.4...

Top Stories

Apple's Q2 earnings reveal a price hike for the Mac mini to $799, fueled by AI memory demand, as Google and Amazon also report...

Top Stories

Cambricon surges to $423M in Q1 revenue with a 160% increase, outpacing Nvidia's dwindling market share in China, now below 60%.

Top Stories

Nvidia enters South Korea's AI market by launching 7 million Korean-language personas and the multimodal Nemotron3 Nano, aiming to establish market dominance.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.