Connect with us

Hi, what are you looking for?

AI Generative

MIT’s New TLT Method Doubles LLM Training Speed While Preserving Accuracy

MIT researchers unveil a new TLT method, boosting reasoning LLM training speed by 70-210% while maintaining accuracy, revolutionizing AI efficiency.

Researchers from MIT and other institutions have developed a novel technique to significantly enhance the training speed of reasoning large language models (LLMs), which are adept at tackling complex problems through step-by-step breakdowns. This breakthrough, presented at the upcoming ACM International Conference on Architectural Support for Programming Languages and Operating Systems, could revolutionize the efficiency of training these advanced models, critical for applications such as financial forecasting and risk detection in power grids.

The new method addresses the considerable computational and energy demands associated with training reasoning models, which often suffer from inefficiencies during the training process. While some high-powered processors tirelessly work on intricate queries, others remain idle, wasting potential computational resources. The innovative technique devised by the team leverages this downtime by training a smaller, faster model that predicts the outputs of the larger reasoning LLM, with the latter verifying the smaller model’s outputs. This not only accelerates the training process but also reduces the workload on the reasoning model, doubling training speed without sacrificing accuracy.

“People want models that can handle more complex tasks. But if that is the goal of model development, then we need to prioritize efficiency,” said Qinghao Hu, an MIT postdoc and co-lead author of the paper detailing the technique. Joining Hu on the paper are co-lead author Shang Yang, along with Junxian Guo, and senior author Song Han, an associate professor at MIT and distinguished scientist at NVIDIA. The research team includes members from ETH Zurich, the MIT-IBM Watson AI Lab, and the University of Massachusetts at Amherst.

The training bottleneck in reasoning LLMs arises when developers use reinforcement learning (RL) to enable these models to identify and rectify mistakes in their reasoning processes. The RL technique involves generating multiple potential answers to a query and rewarding the best option, updating the model based on these top answers. However, researchers found that the rollout process—generating multiple potential answers—can consume as much as 85 percent of the execution time in RL training. “Updating the model—which is the actual ‘training’ part—consumes very little time by comparison,” Hu noted.

This bottleneck occurs because all processors in the training pool must finish their responses before advancing to the next step. Consequently, processors generating shorter responses may idle while waiting for others to complete more extended tasks. The team sought to mitigate this issue using a technique known as speculative decoding, where a smaller model, called a drafter, rapidly predicts future outputs of the larger model. The larger model then verifies the drafter’s predictions, expediting the training process.

Traditionally, drafter models are trained only once and remain static, which is impractical for reinforcement learning where models are updated frequently. To address this limitation, the researchers developed a flexible system termed “Taming the Long Tail” (TLT). This system includes an adaptive drafter trainer that utilizes idle processor time to train the drafter model on the fly, maintaining alignment with the target model without incurring extra computational costs. The second part of TLT, an adaptive rollout engine, optimizes the speculative decoding strategy based on the features of the training workload.

The lightweight design of the drafter model facilitates quick training, allowing TLT to reuse components from the reasoning model training process, further boosting performance. “As soon as some processors finish their short queries and become idle, we immediately switch them to do drafter model training using the same data they are using for the rollout process,” Hu explained.

Testing the TLT across various reasoning LLMs, the researchers reported training speed improvements ranging from 70 to 210 percent while maintaining accuracy. The drafter model also has the potential for efficient deployment as an added benefit.

Looking ahead, the team aims to integrate TLT into a broader array of training and inference frameworks while exploring new RL applications that could benefit from this accelerated approach. “As reasoning continues to become the major workload driving the demand for inference, Qinghao’s TLT is great work to cope with the computation bottleneck of training these reasoning models. I think this method will be very helpful in the context of efficient AI computing,” Han remarked.

This research is funded by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT Amazon Science Hub, Hyundai Motor Company, and the National Science Foundation, highlighting the collaborative effort to push the boundaries of LLM capabilities.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Research

IBM launches a Chicago Quantum Hub to create 750 AI jobs and expands its MIT partnership to advance quantum computing and AI integration.

AI Finance

More than 55% of Americans now turn to AI tools for financial advice, risking personal data exposure despite rising privacy concerns.

AI Research

MIT launches a groundbreaking study to assess AI capacity in Central Eurasia, guiding investment strategies across five nations with a focus on infrastructure and...

AI Research

MatterChat launches a multimodal LLM achieving 95% accuracy in material property predictions, revolutionizing materials science research and applications.

AI Business

Nvidia's Bryan Catanzaro reveals AI compute costs now exceed human labor expenses, with MIT finding 77% of jobs remain cheaper with human workers.

AI Research

IBM and MIT launch the MIT-IBM Computing Research Lab to redefine AI and quantum computing, aiming for transformative breakthroughs by 2029.

AI Research

MIT's Pattie Maes and Pat Pataranutaporn pioneer "intelligence augmented" AI to enhance decision-making and support human capabilities, redefining technology's role.

AI Generative

Cognizant reveals evolution strategies for large language model fine-tuning, enhancing efficiency and reliability while reducing costs in complex reasoning tasks.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.