AI Generative

Alibaba’s Qwen3-VL Scans 2-Hour Videos with 99.5% Accuracy in Frame Detection

Alibaba’s Qwen3-VL achieves 99.5% accuracy in detecting frames within two-hour videos, revolutionizing multimodal AI capabilities with 235 billion parameters.

Staff

Published

28 November, 2025

Alibaba has released a comprehensive technical report on its new Qwen3-VL multimodal model, just months after its launch. This advanced system demonstrates exceptional performance in image-based mathematical tasks and the ability to analyze extensive video footage, making it a significant player in the evolving landscape of artificial intelligence.

The Qwen3-VL model can process substantial data loads, managing two-hour videos or hundreds of document pages within a 256,000-token context window. In rigorous “needle-in-a-haystack” tests, the flagship 235-billion-parameter model achieved a remarkable 100 percent accuracy in locating individual frames within 30-minute videos. Even in longer two-hour videos containing approximately one million tokens, its accuracy remained at an impressive 99.5 percent. This test involves inserting a semantically significant “needle” frame at random intervals, challenging the model to find and analyze the specific frame.

In published benchmarks, Qwen3-VL frequently outperforms competitors such as Gemini 2.5 Pro, OpenAI GPT-5, and Claude Opus 4.1, even when the rivals deploy advanced reasoning features or larger processing budgets. For instance, Qwen3-VL scored 85.8 percent on the MathVista benchmark, surpassing GPT-5’s 81.3 percent, and it led on MathVision with a score of 74.6 percent, ahead of Gemini 2.5 Pro at 73.3 percent and GPT-5 at 65.8 percent.

In addition to excelling in visual mathematics, the model showcases versatility across specialized benchmarks, achieving 96.5 percent on the DocVQA document comprehension test and 875 points on OCRBench, which supports 39 languages—nearly quadrupling the language capabilities of its predecessor. It also demonstrated substantial accuracy in graphical user interface tasks, scoring 61.8 percent on the ScreenSpot Pro test and 63.7 percent on AndroidWorld, where it must operate Android apps independently.

Complex, multi-page PDF documents are not beyond its capabilities either, as Qwen3-VL scored 56.2 percent on MMLongBench-Doc for long document analysis. On the CharXiv benchmark for scientific charts, it achieved 90.5 percent on description tasks and 66.2 percent on complex reasoning questions. However, the model does face challenges; in the complex MMMU-Pro test, it scored 69.3 percent, falling short of GPT-5’s 78.4 percent, and commercial competitors generally outperform it in video question-answering benchmarks. Thus, Qwen3-VL appears to be a specialist in visual mathematics and document comprehension, while still lagging in broader reasoning capabilities.

Technical Details

The technical report outlines three key architectural advancements. First, the introduction of “interleaved MRoPE” replaces the previous position embedding method, aiming to enhance performance on long videos by distributing mathematical representations evenly across all dimensions. Second, DeepStack technology allows the model to access intermediate results from the vision encoder, offering insights from various levels of detail rather than relying solely on final outputs. Third, a simplified text-based timestamp system replaces the T-RoPE method of the previous model, streamlining the process of marking video frames with time indicators. This adjustment enhances the model’s understanding of time-based video tasks.

Alibaba trained Qwen3-VL in four phases using up to 10,000 GPUs. Following initial training to link images and text, it underwent complete multimodal training on around one trillion tokens, utilizing diverse data sources including web scrapes, 3 million PDFs from Common Crawl, and over 60 million STEM tasks. The context window was expanded progressively from 8,000 tokens to 32,000 and eventually to 262,000 tokens. The model’s “Thinking” variants received specific training to explicitly map out reasoning steps, improving performance on complex problems.

All variants of Qwen3-VL released since September are available under the Apache 2.0 license with open weights on Hugging Face. This includes dense models ranging from 2B to 32B parameters, as well as mixture-of-experts models like the 30B-A3B and the massive 235B-A22B. While some features, such as extracting frames from lengthy videos, are not entirely new—Google’s Gemini 1.5 Pro demonstrated similar capabilities in early 2024—Qwen3-VL offers competitive performance in an open-source framework. Given the popularity of the previous Qwen2.5-VL model in research circles, the latest iteration is likely to accelerate further open-source innovations in the field.

AI Finance

R-AI Finance Launches AI Investment Platform, Promising 24/7 Automated Trading Solutions

R-AI Finance introduces a revolutionary AI investment platform, offering 24/7 automated trading solutions to maximize returns for users in a dynamic fintech landscape.

Marcus Chen3 hours ago

AI Education

Cyient Foundation Empowers 35,000 Rural Youth with AI Education Initiatives

Cyient Foundation empowers 35,000 rural youth through AI education initiatives, enhancing digital literacy and skills for future employability across India.

David Park3 hours ago

AI Marketing

AI Marketing Tools Surge in 2026: 88% of Marketers Now Leverage AI for Content Creation

AI marketing tools see 88% adoption in 2026, revolutionizing content creation and boosting ROI by 5-15% as teams embrace automation and predictive analytics

Sofía Méndez6 hours ago

AI Cybersecurity

UK Firms Report 54% Rise in Cyberwarfare Attacks, Average Ransom Payments Hit £7.71M

UK firms report a 54% increase in cyberwarfare incidents and average ransomware payments soaring to £7.71M, highlighting urgent cybersecurity vulnerabilities.

Rachel Torres8 hours ago

AI Tools

ServiceTitan Reports 56% Discount in Valuation After AI Leadership Shift and Q4 Results

ServiceTitan's stock trades at a 56% discount post-Q4 results and new AI leadership, raising questions about future growth amid a volatile market.

Staff19 hours ago

AI Cybersecurity

Alibaba Launches OpenClaw App Amid Rapid AI Adoption, Igniting Security Concerns

Alibaba unveils the JVS Claw app to streamline OpenClaw's adoption, amid rising security concerns as AI tools rapidly infiltrate daily tasks.

Rachel Torres24 hours ago

AI Marketing

AI-Driven Advertising Boosts ROI by 30% with Real-Time Optimizations and Smart Bidding

AI-driven advertising enhances ROI by 30% through real-time optimizations and smart bidding, enabling businesses to maximize campaign efficiency and reach high-value consumers.

Sofía Méndez1 day ago

AI Cybersecurity

Hackers Leverage Machine Learning to Enhance AI Cyberattacks and Bypass Security Defenses

Hackers exploit machine learning to automate cyberattacks, significantly enhancing phishing and malware tactics, challenging organizations to bolster defenses rapidly.

Rachel Torres1 day ago

AIPRESSA.COM

AI Generative

Alibaba’s Qwen3-VL Scans 2-Hour Videos with 99.5% Accuracy in Frame Detection

Technical Details

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Finance

R-AI Finance Launches AI Investment Platform, Promising 24/7 Automated Trading Solutions

AI Education

Cyient Foundation Empowers 35,000 Rural Youth with AI Education Initiatives

AI Marketing

AI Marketing Tools Surge in 2026: 88% of Marketers Now Leverage AI for Content Creation

AI Cybersecurity

UK Firms Report 54% Rise in Cyberwarfare Attacks, Average Ransom Payments Hit £7.71M

AI Tools

ServiceTitan Reports 56% Discount in Valuation After AI Leadership Shift and Q4 Results

AI Cybersecurity

Alibaba Launches OpenClaw App Amid Rapid AI Adoption, Igniting Security Concerns

AI Marketing

AI-Driven Advertising Boosts ROI by 30% with Real-Time Optimizations and Smart Bidding

AI Cybersecurity

Hackers Leverage Machine Learning to Enhance AI Cyberattacks and Bypass Security Defenses