AI Generative

Researchers Benchmark 19 Multimodal Models for Open-Vocabulary Emotion Recognition

Researchers from Cambridge and Hunan University benchmarked 19 multimodal models, revealing a two-stage trimodal fusion method that optimizes emotion recognition accuracy by significantly leveraging video data.

Staff

Published

1 January, 2026

Researchers from the University of Cambridge, Hunan University, and other institutions have made significant strides in understanding human emotion through artificial intelligence, unveiling the first large-scale evaluation of how effectively large multimodal models—capable of processing text, audio, and video—recognize emotional expressions in real-world contexts. This groundbreaking work, driven by Jing Han and colleagues, goes beyond identifying a restricted range of emotions to tackle the complex task of recognizing a broad spectrum of emotional cues, setting crucial benchmarks for the evolving field of emotional AI.

In their study, the team systematically tested 19 leading models, finding that the integration of audio, video, and text produced the most accurate results, with video being especially critical. The findings reveal that open-source models can compete closely with their closed-source counterparts, offering insights crucial for developing more sophisticated emotion recognition technologies.

Recent advancements in Large Language Models (LLMs) underscore the rapid evolution of multimodal AI, with significant contributions from tech giants like Google and Alibaba. Google’s Gemini, a family of models that process text, images, audio, and video, is designed to function as an agentic AI. In contrast, Alibaba’s Qwen series, which includes models for audio and language, has highlighted performance enhancements in its Qwen2.5 model and integrated reinforcement learning through its DeepSeek models to improve reasoning capabilities.

A key focus of current research is the development of prompting techniques aimed at enhancing LLM performance. Innovative strategies such as chain-of-thought prompting, self-consistency, and least-to-most prompting are being explored to refine the thought processes of these models. Direct Preference Optimization, another reinforcement learning technique, is also being applied to enhance the quality of model responses. The scope of this research extends beyond text, delving into multimodal understanding, particularly in video and audio processing. Models like LLaVA-Video and Tarsier2 are making strides in video comprehension, while Qwen-Audio aims for unified audio-language processing. Researchers are also investigating methods to improve temporal understanding in video LLMs and scale the performance of open-source multimodal models.

A comprehensive evaluation framework has been constructed using the OV-MERD dataset to assess the reasoning, fusion strategies, and prompt design of 19 mainstream multimodal large language models (MLLMs) in open-vocabulary emotion recognition. This extensive benchmarking reveals both the capabilities and limitations of current MLLMs in understanding nuanced emotional expressions. The study builds upon previous methodologies that used emotional clues, extending them with innovative architectures for enhanced performance.

Through experimentation, researchers determined that a two-stage trimodal fusion—which integrates audio, video, and text—achieves optimal performance in emotion recognition. Video was identified as the most critical modality, significantly enhancing accuracy compared to audio or text alone. In-depth analysis of prompt engineering indicated a surprisingly narrow performance gap between open-source and closed-source LLMs. This research has established essential benchmarks and offers actionable guidelines for advancing fine-grained affective computing.

Insights from this research suggest that while complex reasoning models have their advantages, they may not necessarily outperform simpler models when it comes to direct emotion identification. The study highlights the need for future work to focus on comprehensive datasets, multilingual evaluations, and more sophisticated multimodal fusion techniques, which could further refine AI’s ability to understand and interpret emotions. As this field continues to evolve, its applications in domains such as medical education and emotional recognition promise significant advancements in human-computer interaction.

👉 More information
🗞 Pioneering Multimodal Emotion Recognition in the Era of Large Models: From Closed Sets to Open Vocabularies
🧠 ArXiv: https://arxiv.org/abs/2512.20938

ValleyNXT Ventures Unveils ₹400 Crore Bharat Breakthrough Fund for AI and Defence Startups

ValleyNXT Ventures launches the ₹400 crore Bharat Breakthrough Fund to accelerate seed-stage AI and defence startups with a unique VC-plus-accelerator model

Staff2 hours ago

AI Regulation

AI Submissions Surge: Clarkesworld Adapts to New Norms Amidst Industry-wide Challenges

Clarkesworld halts new submissions amid a surge of AI-generated stories, prompting industry-wide adaptations as publishers face unprecedented content challenges.

Staff5 hours ago

AI Technology

Harvard’s Donald Thompson Explores AI Integration for Enhanced Leadership and Culture

Donald Thompson of Workplace Options emphasizes the critical role of psychological safety in AI integration, advocating for human-centered leadership to enhance organizational culture.

Staff6 hours ago

AI Tools

KPMG Partner Fined A$10,000 for Cheating with AI in Internal Training Exam

KPMG fines a partner A$10,000 for using AI to cheat in internal training, amid a trend of over two dozen staff caught in similar...

Staff6 hours ago

IBM Faces Heightened AI Scrutiny Amid New Product Launches and Valuation Concerns

IBM faces investor scrutiny as its stock trades 24% below target at $262.38, despite launching new AI products and hiring for next-gen skills.

Staff6 hours ago

AI Finance

Apollo Global Management Reveals $40T Private Credit Vision and $7T AI Funding Needs at BofA Conference

Apollo Global Management reveals a $40 trillion vision for private credit and anticipates $5-$7 trillion in AI funding over the next five years at...

Marcus Chen7 hours ago

AI Cybersecurity

Next-Gen Cybersecurity: 70% of Dubai Firms Prioritize AI for Enhanced Threat Protection

Seventy percent of firms in Dubai are prioritizing AI, projected to drive the cybersecurity market to $23.54 billion with a 14.55% growth this year.

Rachel Torres7 hours ago

Expedia Reports 11% Q4 Revenue Growth as AI Transforms Travel Discovery Strategy

Expedia Group reports 11% Q4 revenue growth to $3.5 billion, fueled by AI-driven travel discovery and a 24% surge in B2B bookings to $8.7...

Staff9 hours ago

AIPRESSA.COM

AI Generative

Researchers Benchmark 19 Multimodal Models for Open-Vocabulary Emotion Recognition

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

Top Stories

ValleyNXT Ventures Unveils ₹400 Crore Bharat Breakthrough Fund for AI and Defence Startups

AI Regulation

AI Submissions Surge: Clarkesworld Adapts to New Norms Amidst Industry-wide Challenges

AI Technology

Harvard’s Donald Thompson Explores AI Integration for Enhanced Leadership and Culture

AI Tools

KPMG Partner Fined A$10,000 for Cheating with AI in Internal Training Exam

Top Stories

IBM Faces Heightened AI Scrutiny Amid New Product Launches and Valuation Concerns

AI Finance

Apollo Global Management Reveals $40T Private Credit Vision and $7T AI Funding Needs at BofA Conference

AI Cybersecurity

Next-Gen Cybersecurity: 70% of Dubai Firms Prioritize AI for Enhanced Threat Protection

Top Stories

Expedia Reports 11% Q4 Revenue Growth as AI Transforms Travel Discovery Strategy