AI Generative

SenseTime Launches NEO, First Native Multimodal Architecture, Outperforming Top Models

SenseTime unveils NEO, the world’s first open-source native multimodal architecture, achieving top performance with just 390 million image-text pairs, outpacing leading models.

Staff

Published

5 December, 2025

On December 5, 2025, SenseTime, in collaboration with Nanyang Technological University and various research teams, unveiled NEO, touted as the world’s first scalable, open-source native multimodal architecture (Native VLM). This development signifies a breakthrough from traditional modular “assembly-style” models, heralding a new era of genuine multimodal fusion in artificial intelligence.

NEO diverges from conventional architectures like GPT-4V and Claude 3.5, which typically rely on a pipeline that includes a vision encoder, a projection layer, and a language model. Instead, it features a unified multimodal ‘brain’ designed to integrate different modes of input seamlessly. This innovation stems from three native technologies: Native Patch Embedding, which constructs high-fidelity visual representations directly from pixel data; Native 3D Rotary Position Encoding, which allocates specific frequencies for spatiotemporal information; and Native Multi-Head Attention, enabling collaborative attention patterns across both text and vision to address the semantic gap at the architectural level.

Initial evaluations indicate that NEO is competitive with leading models, such as Qwen2-VL and InternVL3, particularly on visual tasks including AI2D and DocVQA. Remarkably, NEO achieves this using merely 390 million image-text pairs—just one-tenth of the data utilized by comparable multimodal models. When assessed on extensive benchmarks like MMMU and MMBench, NEO not only meets but surpasses the performance of existing native VLMs, showcasing its overall capability.

Further enhancing its appeal, NEO’s models, ranging from 2 billion to 8 billion parameters, offer exceptional cost-efficiency in inference, making them well-suited for deployment across mobile devices, robotics, and various edge scenarios. SenseTime has already open-sourced the 2 billion and 9 billion versions of NEO and has ambitious plans to extend its architecture to encompass video understanding and 3D interactions. This expansion aims to transform the capabilities of multimodal AI while facilitating a broader movement of advanced artificial intelligence technologies from centralized cloud systems to more accessible edge devices.

As a groundbreaking framework, NEO not only contributes to the evolving landscape of artificial intelligence but also underscores the significant strides made by Chinese researchers in global AI architecture innovation. The release of NEO may catalyze further advancements in multimodal technologies, reshaping how machines understand and interact with the world. With the ongoing development in this field, the potential applications of such technology are vast, promising to enhance various sectors well into the future.

AI Generative

SenseTime Launches SenseNova U1, Promising Speedy Image Processing for AI Development

SenseTime unveils SenseNova U1, an open-source model that processes images directly and faster than competitors, aiming to reclaim its position in AI innovation.

Staff29 April, 2026

AIPRESSA.COM

AI Generative

SenseTime Launches NEO, First Native Multimodal Architecture, Outperforming Top Models

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

AI Generative

SenseTime Launches SenseNova U1, Promising Speedy Image Processing for AI Development

AI Generative

NTU and CUHK Unveil MoTok: 89% Error Reduction in Motion Generation with 1/6 Tokens

Top Stories

Chinese AI and Semiconductor Stocks Surge as Independence Strategy Fuels Investment in Hong Kong

Top Stories

DeepSeek R1 Expands from 22 to 86 Pages, Surpassing OpenAI’s Capabilities

Top Stories

SenseTime Stock Dips 0.53% to HK$1.88 Amid Intensifying AI Market Competition