Connect with us

Hi, what are you looking for?

AI Research

Apple Unveils ParaRNN, Achieving 665× Faster Training for 7B-Parameter RNNs

Apple’s new ParaRNN framework accelerates RNN training by 665×, enabling efficient large language models with up to 7 billion parameters.

Apple researchers have made a significant breakthrough in the training efficiency of Recurrent Neural Networks (RNNs), marking the first time that large-scale training for these models has become feasible. Their new framework, detailed in a paper titled “ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models,” has been accepted for presentation at ICLR 2026. This advancement allows practitioners to explore a broader range of architectures when designing large language models (LLMs), particularly in scenarios where computational resources are limited.

The new ParaRNN framework achieves a remarkable 665× speedup over conventional sequential training methods, facilitating the training of RNNs with up to 7 billion parameters. This development enhances the competitive performance of these classical models against transformer architectures, which have dominated the field of natural language processing in recent years. The researchers have made their codebase available as an open-source framework, enabling both researchers and practitioners to delve into efficient sequence modeling.

Traditionally, the sequential nature of RNNs has limited their scalability, as training could not be parallelized along the sequence length. While RNNs provide efficient and constant-time token generation during inference, their training process has been a bottleneck due to its step-by-step computational requirements. In contrast, transformers leverage attention mechanisms that allow for simultaneous processing of input tokens, but at the cost of increased computational complexity that grows quadratically with sequence length.

To address these challenges, Apple’s researchers have redefined the recurrence relationship of RNNs, adopting a linear approach that facilitates parallelization. This innovation mirrors techniques used in selective state space models (SSMs), which streamline the training process by employing linear operations. The researchers have introduced adaptations of classical GRUs and LSTMs, known as ParaGRU and ParaLSTM, which utilize structured Jacobians to maintain computational efficiency while enhancing expressivity.

One of the pivotal techniques employed in the ParaRNN framework is the application of Newton’s method, a classical numerical technique for solving nonlinear equations. By framing the entire sequence of hidden states as a single system of equations to be solved simultaneously, this methodology allows for iterative refinements that maintain the nonlinear characteristics of traditional RNNs while taking advantage of parallel processing capabilities.

Empirical results have demonstrated that with just three iterations of Newton’s method, the adapted RNNs can achieve comparable hidden state evolution to that of traditional RNN training, significantly reducing training time. The researchers have conducted experiments that involved training models ranging from 400 million to 7 billion parameters, confirming that even classical RNNs can perform competitively when trained at scale. The outcomes indicate that ParaGRU and ParaLSTM achieve perplexity and performance metrics on par with both transformers and state-of-the-art SSMs.

While the newly developed framework is designed to facilitate large-scale training, it still requires careful engineering to be practical. The parallel reduction algorithm central to this approach must efficiently handle the storage and multiplication of Jacobian matrices arising from the linearization process. To mitigate the complexity associated with generic RNNs, the researchers have prioritized structured Jacobians, which significantly reduce the computational demands of the training process.

In terms of application, the real benefits of RNNs become particularly evident during inference. RNNs maintain high throughput regardless of context length, making them an attractive option for applications that prioritize rapid generation. In contrast to transformers, whose generation time increases with sequence length, RNNs’ constant-time token generation leads to more efficient performance overall.

Moreover, incorporating nonlinearities into the recurrence definitions has resulted in enhanced performance on tasks that require state tracking and retrieval capabilities. This capability highlights the advantages of nonlinear RNNs over purely linear models, underscoring the importance of expressivity in modern sequence modeling. The results indicate that classical RNNs, once constrained by computational limitations, can now scale effectively and potentially rival the performance of advanced transformer models.

As the landscape of artificial intelligence continues to evolve, the ParaRNN framework presents an opportunity to revisit nonlinear recurrence in modern sequence modeling, paving the way for novel architectures and enhanced modeling capabilities. With this development, Apple has not only advanced the field of RNN training but has also laid the groundwork for future exploration in recurrent models at scale.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Technology

Apple appoints John Ternus as CEO, signaling a shift to prioritize in-house AI development over AR partnerships amid a 0.5% dip in shares.

Top Stories

Perplexity CEO Aravind Srinivas defends the iPhone's resilience against AI disruption, emphasizing its role as a vital "digital passport" amid evolving technology.

AI Research

Apple unveils revolutionary research at ICLR 2026, showcasing a 665× faster training for RNNs, positioning itself as a leader in AI advancements.

AI Technology

Apple appoints John Ternus as CEO, steering the company toward an AI-driven future as mobile phones evolve from mere devices to integrated interaction systems.

Top Stories

Apple names John Ternus CEO in September 2026, as pressure mounts over delayed AI initiatives, with shares trading at a premium P/E ratio of...

AI Technology

Apple's new CEO John Ternus faces high expectations for AI innovation as he prepares for a pivotal WWDC reveal amid growing competition and lagging...

AI Government

US government accelerates AI-driven surveillance with $165 billion funding through DHS, raising serious privacy concerns and ethical implications.

AI Technology

Apple deploys 200 engineers for an urgent retraining initiative to revamp Siri ahead of WWDC 2023, aiming to enhance its AI capabilities by 2026.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.