Connect with us

Hi, what are you looking for?

AI Generative

LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals

Researchers demonstrate that large language models achieve over 99% accuracy as world models, revolutionizing AI agent training with simulated environments.

Recent research has revealed that large language models (LLMs) can effectively simulate environments, addressing a significant challenge in the training of autonomous AI agents. Autonomous AI systems depend on real-world interactions to gain experience, yet these environments can be limited, difficult to replicate, and often too rigid for diverse learning. Researchers from the Southern University of Science and Technology, Microsoft Research, Princeton University, the University of Edinburgh, and others explored whether LLMs could serve as internal simulators—termed “world models”—to enable training through simulated experiences instead of solely relying on real-world data.

A world model predicts the outcome of an action taken by an AI agent, allowing it to learn in a controlled, synthetic environment. This approach reframes language modeling from predicting the next word to forecasting the next state of the environment following an action. The researchers aimed to demonstrate that this capability allows LLMs to function as precise world simulators, potentially improving the efficiency of AI training.

The study evaluated LLMs across five different text-based environments: ALFWorld, where agents perform household tasks; SciWorld, a simulation for scientific experiments; TextWorld, which presents narrative puzzles; WebShop, a shopping site where agents search for products; and StableToolBench, focused on API tool usage. This diverse set of environments provided a mix of structured tasks with clear rules and more variable scenarios, allowing the team to assess the models’ predictive accuracy over longer sequences, their scalability with increased data, and their practical utility in actual training scenarios.

Initial findings showed that pre-trained models demonstrated some capacity for modeling environments, with Claude-sonnet-4.5 achieving 77 percent accuracy in predicting outcomes in the household tasks of ALFWorld after just three examples. However, this accuracy was insufficient for more complicated tasks. The breakthrough came with additional fine-tuning using real interaction data, which enabled models like Qwen2.5-7B and Llama-3.1-8B to exceed 99 percent accuracy in ALFWorld, approximately 98.6 percent in SciWorld, and around 70 percent in TextWorld.

Longer action sequences also maintained high reliability. In structured environments, the consistency ratio surpassed 90 percent, indicating that plans developed through the world model succeeded in real-world applications at rates comparable to those achieved through direct interactions. However, the e-commerce simulation presented more challenges, with consistency rates averaging around 70 percent and varying significantly between different agents. When simulated processes were initialized with real observations, consistency improved dramatically, nearing 100 percent even with a GPT-4o agent.

As the researchers explored scaling, they identified distinct patterns. For structured environments, accuracy plateaued after about 20,000 training trajectories—recorded sequences of agent actions. In contrast, open environments, such as the shopping site, showed continued improvement with increased data, reaching up to 70,000 trajectories. Similar scaling effects were observed in model size; while 1.5 billion parameter models performed well in structured settings, more complex scenarios necessitated larger models. The findings underscore that both data volume and model size must scale with the complexity of the environment for effective world modeling.

This research supports a growing discourse on the future direction of AI, echoing concerns raised by Turing Award winner Richard Sutton. He has stated that the AI industry is at a crossroads, arguing for a shift towards continuous learning from experience rather than relying on pre-existing knowledge. In his co-authored essay “Welcome to the Era of Experience” with DeepMind researcher David Silver, Sutton advocates for AI agents that learn from their own experiences, with world models playing a crucial role as internal simulators.

While this study provides empirical evidence that LLMs can simulate environmental dynamics, it does not fully address Sutton’s concern regarding the necessity for continuous learning without the risk of forgetting past knowledge—an essential aspect for achieving true intelligence in AI systems.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Generative

ChatGPT surpasses Google Gemini in research, SEO, and transcription accuracy, proving essential for analytical tasks and structured content generation.

AI Generative

LLMs revolutionize undergraduate education by delivering personalized on-demand tutoring, but their integration raises concerns about learning depth and academic integrity.

Top Stories

Mistral AI launches Codestral, a 22B parameter coding model scoring 81.1% on HumanEval, challenging proprietary systems with advanced efficiency and accessibility.

AI Technology

Researchers unveil the NSLLM framework, achieving 19.8x energy efficiency over A800 GPUs while enhancing LLM interpretability through neuromorphic design.

AI Education

Zuo unveils a groundbreaking AI method for automatically generating tailored ESL materials aligned with CEFR levels, enhancing personalized learning for diverse learners.

AI Regulation

China mandates a 95% compliance requirement for AI models, enforcing strict ideological testing to secure political stability and control over technology deployment.

AI Regulation

UK's AI Growth Lab launches a groundbreaking 'sandbox' initiative to harmonize compliance and innovation, empowering smaller firms to thrive in AI development.

AI Finance

Arab Bank and Banco do Brasil revolutionize banking with AI solutions, enhancing lead generation and compliance through over 700 models and advanced data analytics.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.