As artificial intelligence (AI) systems evolve, their limitations are shifting from linguistic competence to the understanding and interaction with the physical world. In a conversation with Jeetu Patel, chief product officer at Cisco, Fei-Fei Li, CEO of World Labs, emphasized that while AI has made significant strides in handling language, it still struggles to reason effectively in environments where physical interaction is essential.
Li noted that the ability to navigate and interact with three-dimensional and four-dimensional spaces is crucial for systems that operate outside of traditional screens. “The ability to understand, to reason, to interact with and to navigate the real 3D, 4D physical world is the foundation,” she stated. She pointed out that AI systems lacking spatial intelligence cannot be reliably employed in fields such as robotics, design, simulation, healthcare, and logistics, where the cost of error is high.
Drawing a clear distinction between visual generation and spatial understanding, Li explained that many current AI models can create convincing images or videos but often fail to maintain consistency over time or perspective. In contrast, spatial models are designed to uphold structural integrity, enabling navigation and interaction. At World Labs, this insight informs the development of its first product, Marble, which utilizes multimodal inputs like text, images, and simple 3D prompts to generate interactive and navigable environments.
Despite being in its early stages, Marble is functionally distinct from video models. It maintains geometric consistency, which has practical implications across various industries. Game developers and virtual production teams are already adopting these spatial models for creating environments, while robotics teams use them for training and simulation. Architects and designers are also leveraging these models for spatial planning. Li added that intriguingly, clinical researchers have expressed interest in using generated environments for mental health studies, where controlling physical context is challenging.
Li emphasized that the utility of spatial models lies in their ability to reduce the risks and costs associated with experimentation. Such models allow organizations to test scenarios digitally, mitigating the expenses and complexities of physical recreations. However, the evolution of spatial AI will not occur as rapidly as that of language models. Li pointed out that while text data is abundant and standardized, the same cannot be said for physical data, which is often scarce and challenging to obtain.
World Labs employs hybrid data strategies that merge vast online image and video resources with simulated and carefully gathered real-world data. Li compared this approach to the development timeline for autonomous vehicles, which required years of collecting real and simulated driving data before reaching limited commercial deployment.
As a result, contemporary spatial models are smaller and utilize fewer computational resources than advanced language models, reflecting both the data scarcity and the relative youth of the field. “Just because the North Star is clear doesn’t mean the journey is short,” Li remarked, highlighting the importance of architecture, data quality, and simulation fidelity over sheer scale.
Li’s comments challenge the prevailing narrative that positions compute power and capital investment as the primary drivers of AI progress. Instead, she identified data availability, simulation accuracy, and domain-specific knowledge as equally critical constraints when expecting AI systems to operate in the physical world. “Scale alone does not resolve the problem of understanding space,” she noted, emphasizing that advancements will depend significantly on these factors.
In the near term, Li suggested that the value from spatial AI will emerge from targeted applications where environments are well-defined, and errors are manageable. The broader promise of spatial AI remains intact, but its timelines will depend more on data quality and execution capabilities than on ambition alone. As AI companies like Nvidia and OpenAI continue to explore this domain, the focus on spatial intelligence could reshape how machines interact with the world around us.
See also
Lincoln County Proposes Year-Long Moratorium on AI Hyperscale Data Centers
Grok Still Generates Non-Consensual Sexualized Images Despite Promised Safeguards
Global AI Legislation Surge: EU’s AI Act Faces Delays Amid Global Deregulation and New Frameworks
Germany”s National Team Prepares for World Cup Qualifiers with Disco Atmosphere
95% of AI Projects Fail in Companies According to MIT













































