Connect with us

Hi, what are you looking for?

Top Stories

Study Reveals ChatGPT Drops to 8th Place, Surpassed by 7 Competing AI Models

OpenAI’s ChatGPT drops to eighth place in Prolific’s Humaine benchmark, surpassed by seven competitors including Google’s Gemini 2.5 Pro.

Since its public release in late 2022, OpenAI’s ChatGPT has been at the forefront of the generative AI chatbot phenomenon. It has maintained a significant portion of the market share despite the emergence of formidable competitors like Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral. However, a recent study conducted by the British company Prolific has cast a surprising light on its standing among these models.

Prolific’s research ranks ChatGPT in the eighth position on its newly developed benchmark called “Humaine.” This ranking places it behind several models, including those from Gemini, Grok, DeepSeek, and even one from Mistral, a French AI company. The Humaine benchmark is designed to evaluate AI performance through the lens of natural human interaction, a significant shift from traditional metrics that often prioritize specialized datasets and complex reasoning tasks that may not resonate with everyday users.

Prolific’s blog post highlights a critical gap in current evaluation methods, stating that the existing metrics tend to favor researchers while failing to address what users genuinely value. The emphasis on esoteric reasoning tasks creates a disconnect between optimization and actual user experience. Moreover, the study points out the potential shortcomings of human-preference leaderboards, suggesting they may be skewed towards tech-savvy participants and thus fail to represent broader user preferences adequately.

New Ranking Insights

According to the findings from the Humaine study, the top ten AI models are as follows:

  • 1. Gemini 2.5 Pro (Google)
  • 2. DeepSeek v3 (DeepSeek)
  • 3. Magistral Medium (Mistral)
  • 6. Gemini 2.5 Flash (Google)
  • 7. DeepSeek R1 (DeepSeek)
  • 10. Gemini 2.0 Flash (Google)

This ranking is particularly notable given the context of its publication in September, a time frame when Google had not yet launched its Gemini 3 Pro model, and xAI had yet to release its Grok 4.1 and Grok 4.1 Thinking models. The sustained dominance of Gemini 2.5 Pro across various leaderboards since its introduction has set high expectations in the industry.

ChatGPT’s placement outside the top five—particularly behind models from competitors like DeepSeek, Grok, and Mistral—marks a significant shift, if the results hold true. While the researchers have not provided explicit reasons for ChatGPT’s lower ranking, they emphasized that Gemini 2.5 Pro consistently emerged as the top model for the “Overall Winner” metric.

This new landscape presents a challenge for OpenAI, as the results call into question the model’s perceived capability relative to its rivals. As the competition heats up in the generative AI field, the implications for developers and users alike are profound. Companies are increasingly required to align their models not only with technical excellence but also with user-centric metrics that genuinely reflect real-world interactions.

The evolution of benchmarks like Humaine suggests a potential shift in how AI performance is evaluated in the future. By prioritizing user engagement and relevance, the AI community may be on the cusp of re-defining its standards, ensuring that models like ChatGPT can adapt and respond to a changing landscape where user experience takes precedence.

As the AI ecosystem continues to evolve, the challenge for established players like OpenAI will be to maintain their relevance in the face of innovative metrics and competitive pressures. The findings from Prolific’s Humaine benchmark serve as a reminder that in the fast-paced world of AI, staying ahead requires not just technological prowess, but a deep understanding of user needs and preferences.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Business

Tata Consultancy Services partners with GitLab to revolutionize enterprise software development by integrating AI-driven automation across multiple sectors.

AI Regulation

UK government plans to impose a social media ban for under-16s and tighten AI regulations after reports of harmful chatbot incidents, aiming for swift...

AI Technology

Multiverse Computing debuts the free HyperNova 60B AI model, achieving near-frontier performance with a 32GB footprint, halving resource requirements.

AI Technology

AI expert Toby Walsh warns that 560,000 users exhibit signs of psychosis due to chatbot design, urging immediate scrutiny of AI safety and ethics.

AI Finance

Anthropic launches its enterprise agents program, leveraging Claude integration to automate finance, HR, and legal tasks, enhancing efficiency across operations.

AI Regulation

OpenAI's failure to alert authorities after banning a user for violent posts led to the Tumbler Ridge shooting that killed eight, prompting calls for...

Top Stories

Anthropic accuses MiniMax, DeepSeek, and Moonshot AI of operating 24,000 fake accounts to steal Claude's proprietary features through 16M illicit exchanges.

AI Research

Icahn School of Medicine study reveals that ChatGPT Health under-triages over 50% of urgent cases, raising alarms over AI's reliability in emergency care.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.