AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

NVIDIA launches ProRL AGENT, boosting multi-turn LLM training efficiency by nearly doubling performance on benchmarks, while reducing system latency and enhancing scalability.

Staff

Published

28 March, 2026

NVIDIA researchers have unveiled ProRL AGENT, a new scalable infrastructure aimed at enhancing reinforcement learning (RL) training for multi-turn large language model (LLM) agents. This innovative system adopts a ‘Rollout-as-a-Service’ approach, which separates agent rollout orchestration from the training loop. By addressing resource conflicts inherent to I/O-intensive environment interactions and GPU-intensive policy updates, the architecture aims to overcome existing bottlenecks in agent development.

Multi-turn agent tasks commonly involve complex interactions with external environments, such as code repositories or operating systems, requiring iterative tool usage. Many current frameworks, including SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM, tightly couple rollout control with the training process. This tight coupling leads to two main issues: conflicting system requirements and maintenance barriers. Rollouts depend heavily on I/O operations, necessitating sandbox creation and asynchronous coordination, while training is centered on GPU-intensive tasks like forward and backward passes, causing inefficiencies when managed concurrently. Additionally, embedding rollout logic within the trainer complicates transitions to different training backends or runtime environments.

The ProRL AGENT operates as a standalone HTTP service, managing the complete rollout lifecycle independently of the RL trainer, which communicates with the server via an API. This design choice enhances flexibility, as the trainer remains agnostic to the underlying rollout infrastructure. To optimize throughput, ProRL AGENT utilizes an asynchronous three-stage assembly line: initialization workers set up sandbox containers, rollout workers manage the multi-turn agent loop, and evaluation workers score results to produce reward signals. This structure allows for overlapping phases across jobs, mitigating slow evaluations from hindering the rollout process.

In terms of infrastructure, ProRL AGENT employs Singularity for its sandboxing solution, which allows for rootless execution crucial for deployment on shared high-performance computing (HPC) clusters managed by Slurm. The system incorporates several optimizations aimed at reducing tool execution latency, which often dominates total rollout times. These include replacing traditional terminal multiplexing with a more efficient approach, connecting directly to persistent IPython kernels to eliminate network overhead, and utilizing Unix Domain Sockets for communication within the execution environment to further reduce latency.

Moreover, ProRL AGENT introduces advanced features designed to improve training stability and hardware utilization. The server manages a pool of LLM inference backends, optimizing prefix cache reuse to minimize inference time across multiple agent turns. To prevent re-tokenization drift, the system maintains token IDs as the canonical representation throughout the entire process, ensuring consistency between rollout and training. It also supports Dynamic Sampling Policy Optimization (DAPO), which filters out non-informative prompts and employs an asynchronous replenishment mechanism to maintain high throughput.

Experimental results validate the effectiveness of ProRL AGENT, demonstrating significant performance improvements across various model scales. For instance, the Qwen3-8B model saw its performance nearly double on the SWE-Bench Verified benchmark, increasing from 9.6% to 18.0%. Similarly, the Qwen3-14B model improved from 15.4% to 23.6%. The system showcased not only advancements in software engineering but also its applicability across STEM, math, and coding domains, with steady reward growth observed during RL training. Scalability tests confirmed that rollout throughput increases nearly linearly as additional compute nodes are introduced.

The introduction of ProRL AGENT signifies a meaningful step in the evolution of reinforcement learning infrastructures, effectively decoupling the rollout lifecycle from policy training. It offers substantial performance gains, reduces system latency, and ensures consistent tokenization, all while facilitating native deployment on HPC clusters. As the demand for more sophisticated AI models grows, innovations like ProRL AGENT could play a pivotal role in optimizing the training processes necessary to develop advanced AI systems.

AI Government

US Defense Partners with Anthropic, OpenAI, and Tech Giants for AI-First Military Initiative

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

Staff3 May, 2026

AI Technology

AMD Launches Ryzen AI Halo Mini-PC with 128GB RAM and NPU for Local AI Development

AMD unveils the Ryzen AI Halo Mini-PC, boasting a 16-core Ryzen AI Max+ 395 APU and the capability to process models with up to...

Staff3 May, 2026

AI Generative

Nvidia Expands Partnerships with Asian Firms, Boosting AI Chip Demand by 90%

Nvidia's partnerships with Asian firms like LG and Nanya surge AI chip demand to 90% of production costs, reshaping the tech landscape in Asia.

Staff3 May, 2026

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

Nvidia CEO Jensen Huang urges industry leaders to avoid alarmist claims about AI's future, citing concerns over inaccurate predictions like a 50% job displacement...

Marcus Chen2 May, 2026

AIPRESSA.COM

AI Generative

NVIDIA Launches ProRL Agent to Enhance Multi-Turn LLM Training Efficiency at Scale

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

AI Government

US Defense Partners with Anthropic, OpenAI, and Tech Giants for AI-First Military Initiative

AI Technology

AMD Launches Ryzen AI Halo Mini-PC with 128GB RAM and NPU for Local AI Development

AI Generative

Nvidia Expands Partnerships with Asian Firms, Boosting AI Chip Demand by 90%

AI Business

Jensen Huang Critiques AI Doom Predictions, Calls for Fact-Based Discussions

AI Technology

Apple Faces Mac Mini and Studio Shortage as OpenClaw Drives AI Demand Surge

Top Stories

Apple, Google, and Amazon Shine Post-Earnings as AI Demand Reshapes Tech Landscape

Top Stories

Cambricon Reports $423M Q1 Revenue, Surpassing Nvidia’s Market Share in China

Top Stories

Nvidia Launches 7 Million Korean Personas, Enters South Korea’s AI Market with Lock-In Strategy