Study Reveals ChatGPT Drops to 8th Place, Surpassed by 7 Competing AI Models

OpenAI’s ChatGPT drops to eighth place in Prolific’s Humaine benchmark, surpassed by seven competitors including Google’s Gemini 2.5 Pro.

Staff

Published

23 November, 2025

Since its public release in late 2022, OpenAI’s ChatGPT has been at the forefront of the generative AI chatbot phenomenon. It has maintained a significant portion of the market share despite the emergence of formidable competitors like Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral. However, a recent study conducted by the British company Prolific has cast a surprising light on its standing among these models.

Prolific’s research ranks ChatGPT in the eighth position on its newly developed benchmark called “Humaine.” This ranking places it behind several models, including those from Gemini, Grok, DeepSeek, and even one from Mistral, a French AI company. The Humaine benchmark is designed to evaluate AI performance through the lens of natural human interaction, a significant shift from traditional metrics that often prioritize specialized datasets and complex reasoning tasks that may not resonate with everyday users.

Prolific’s blog post highlights a critical gap in current evaluation methods, stating that the existing metrics tend to favor researchers while failing to address what users genuinely value. The emphasis on esoteric reasoning tasks creates a disconnect between optimization and actual user experience. Moreover, the study points out the potential shortcomings of human-preference leaderboards, suggesting they may be skewed towards tech-savvy participants and thus fail to represent broader user preferences adequately.

New Ranking Insights

According to the findings from the Humaine study, the top ten AI models are as follows:

1. Gemini 2.5 Pro (Google)
2. DeepSeek v3 (DeepSeek)
3. Magistral Medium (Mistral)
6. Gemini 2.5 Flash (Google)
7. DeepSeek R1 (DeepSeek)
10. Gemini 2.0 Flash (Google)

This ranking is particularly notable given the context of its publication in September, a time frame when Google had not yet launched its Gemini 3 Pro model, and xAI had yet to release its Grok 4.1 and Grok 4.1 Thinking models. The sustained dominance of Gemini 2.5 Pro across various leaderboards since its introduction has set high expectations in the industry.

ChatGPT’s placement outside the top five—particularly behind models from competitors like DeepSeek, Grok, and Mistral—marks a significant shift, if the results hold true. While the researchers have not provided explicit reasons for ChatGPT’s lower ranking, they emphasized that Gemini 2.5 Pro consistently emerged as the top model for the “Overall Winner” metric.

This new landscape presents a challenge for OpenAI, as the results call into question the model’s perceived capability relative to its rivals. As the competition heats up in the generative AI field, the implications for developers and users alike are profound. Companies are increasingly required to align their models not only with technical excellence but also with user-centric metrics that genuinely reflect real-world interactions.

The evolution of benchmarks like Humaine suggests a potential shift in how AI performance is evaluated in the future. By prioritizing user engagement and relevance, the AI community may be on the cusp of re-defining its standards, ensuring that models like ChatGPT can adapt and respond to a changing landscape where user experience takes precedence.

As the AI ecosystem continues to evolve, the challenge for established players like OpenAI will be to maintain their relevance in the face of innovative metrics and competitive pressures. The findings from Prolific’s Humaine benchmark serve as a reminder that in the fast-paced world of AI, staying ahead requires not just technological prowess, but a deep understanding of user needs and preferences.

AI Business

TCS Partners with GitLab to Enhance AI Automation in Enterprise Software Development

Tata Consultancy Services partners with GitLab to revolutionize enterprise software development by integrating AI-driven automation across multiple sectors.

Marcus Chen4 hours ago

AI Regulation

UK Eyes Social Media Ban for Under-16s as AI Safety Laws Tighten Amid Growing Concerns

UK government plans to impose a social media ban for under-16s and tighten AI regulations after reports of harmful chatbot incidents, aiming for swift...

Staff9 hours ago

AI Technology

Multiverse Computing Launches Free HyperNova 60B AI Model with 32GB Footprint

Multiverse Computing debuts the free HyperNova 60B AI model, achieving near-frontier performance with a 32GB footprint, halving resource requirements.

Staff11 hours ago

AI Technology

AI Expert Warns of Psychosis Signs in 560K Users Amid Concerns Over Chatbot Design

AI expert Toby Walsh warns that 560,000 users exhibit signs of psychosis due to chatbot design, urging immediate scrutiny of AI safety and ethics.

Staff15 hours ago

AI Finance

Anthropic Launches Enterprise Agents for Finance, HR, and Legal Tasks with Claude Integration

Anthropic launches its enterprise agents program, leveraging Claude integration to automate finance, HR, and legal tasks, enhancing efficiency across operations.

Marcus Chen16 hours ago

AI Regulation

OpenAI’s Tumbler Ridge Incident Sparks Calls for New AI Regulation in Canada

OpenAI's failure to alert authorities after banning a user for violent posts led to the Tumbler Ridge shooting that killed eight, prompting calls for...

Staff17 hours ago

Anthropic Accuses MiniMax, DeepSeek, and Moonshot AI of Massive Model Mining Scheme

Anthropic accuses MiniMax, DeepSeek, and Moonshot AI of operating 24,000 fake accounts to steal Claude's proprietary features through 16M illicit exchanges.

Staff21 hours ago

AI Research

Research Reveals ChatGPT Health’s 50% Under-Triage Rate in Emergency Scenarios

Icahn School of Medicine study reveals that ChatGPT Health under-triages over 50% of urgent cases, raising alarms over AI's reliability in emergency care.

Staff1 day ago

AIPRESSA.COM

Top Stories

Study Reveals ChatGPT Drops to 8th Place, Surpassed by 7 Competing AI Models

New Ranking Insights

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Business

TCS Partners with GitLab to Enhance AI Automation in Enterprise Software Development

AI Regulation

UK Eyes Social Media Ban for Under-16s as AI Safety Laws Tighten Amid Growing Concerns

AI Technology

Multiverse Computing Launches Free HyperNova 60B AI Model with 32GB Footprint

AI Technology

AI Expert Warns of Psychosis Signs in 560K Users Amid Concerns Over Chatbot Design

AI Finance

Anthropic Launches Enterprise Agents for Finance, HR, and Legal Tasks with Claude Integration

AI Regulation

OpenAI’s Tumbler Ridge Incident Sparks Calls for New AI Regulation in Canada

Top Stories

Anthropic Accuses MiniMax, DeepSeek, and Moonshot AI of Massive Model Mining Scheme

AI Research

Research Reveals ChatGPT Health’s 50% Under-Triage Rate in Emergency Scenarios