Connect with us

Hi, what are you looking for?

Top Stories

Study Reveals ChatGPT Drops to 8th Place, Surpassed by 7 Competing AI Models

OpenAI’s ChatGPT drops to eighth place in Prolific’s Humaine benchmark, surpassed by seven competitors including Google’s Gemini 2.5 Pro.

Since its public release in late 2022, OpenAI’s ChatGPT has been at the forefront of the generative AI chatbot phenomenon. It has maintained a significant portion of the market share despite the emergence of formidable competitors like Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral. However, a recent study conducted by the British company Prolific has cast a surprising light on its standing among these models.

Prolific’s research ranks ChatGPT in the eighth position on its newly developed benchmark called “Humaine.” This ranking places it behind several models, including those from Gemini, Grok, DeepSeek, and even one from Mistral, a French AI company. The Humaine benchmark is designed to evaluate AI performance through the lens of natural human interaction, a significant shift from traditional metrics that often prioritize specialized datasets and complex reasoning tasks that may not resonate with everyday users.

Prolific’s blog post highlights a critical gap in current evaluation methods, stating that the existing metrics tend to favor researchers while failing to address what users genuinely value. The emphasis on esoteric reasoning tasks creates a disconnect between optimization and actual user experience. Moreover, the study points out the potential shortcomings of human-preference leaderboards, suggesting they may be skewed towards tech-savvy participants and thus fail to represent broader user preferences adequately.

New Ranking Insights

According to the findings from the Humaine study, the top ten AI models are as follows:

  • 1. Gemini 2.5 Pro (Google)
  • 2. DeepSeek v3 (DeepSeek)
  • 3. Magistral Medium (Mistral)
  • 6. Gemini 2.5 Flash (Google)
  • 7. DeepSeek R1 (DeepSeek)
  • 10. Gemini 2.0 Flash (Google)

This ranking is particularly notable given the context of its publication in September, a time frame when Google had not yet launched its Gemini 3 Pro model, and xAI had yet to release its Grok 4.1 and Grok 4.1 Thinking models. The sustained dominance of Gemini 2.5 Pro across various leaderboards since its introduction has set high expectations in the industry.

ChatGPT’s placement outside the top five—particularly behind models from competitors like DeepSeek, Grok, and Mistral—marks a significant shift, if the results hold true. While the researchers have not provided explicit reasons for ChatGPT’s lower ranking, they emphasized that Gemini 2.5 Pro consistently emerged as the top model for the “Overall Winner” metric.

This new landscape presents a challenge for OpenAI, as the results call into question the model’s perceived capability relative to its rivals. As the competition heats up in the generative AI field, the implications for developers and users alike are profound. Companies are increasingly required to align their models not only with technical excellence but also with user-centric metrics that genuinely reflect real-world interactions.

The evolution of benchmarks like Humaine suggests a potential shift in how AI performance is evaluated in the future. By prioritizing user engagement and relevance, the AI community may be on the cusp of re-defining its standards, ensuring that models like ChatGPT can adapt and respond to a changing landscape where user experience takes precedence.

As the AI ecosystem continues to evolve, the challenge for established players like OpenAI will be to maintain their relevance in the face of innovative metrics and competitive pressures. The findings from Prolific’s Humaine benchmark serve as a reminder that in the fast-paced world of AI, staying ahead requires not just technological prowess, but a deep understanding of user needs and preferences.

Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Generative

GPT-4o's controversial guidance led to five lawsuits claiming its advice contributed to wrongful deaths, raising urgent AI safety and ethical concerns.

AI Generative

CrePal revolutionizes video production by cutting creation time from 10 hours to just 10 minutes, streamlining workflows with an integrated AI pipeline.

Top Stories

Worldpay launches MCP, enabling AI to act as active payment agents, enhancing commerce integration as 44% of shoppers embrace AI-driven transactions.

Top Stories

Alphabet shares surged nearly 6% to $317.75 after the debut of Gemini 3, outperforming rivals and signaling a challenge to Nvidia’s AI dominance.

Top Stories

Google DeepMind opens a new AI lab in Singapore to enhance regional collaboration, doubling its team to advance Gemini and drive culturally inclusive AI...

AI Marketing

12AM Agency's Big-AI Upgrade enhances local business visibility, addressing the fact that over 50% of local buying decisions are now influenced by AI engines.

Top Stories

CrowdStrike warns that DeepSeek-R1 generates 50% more vulnerable code on sensitive topics, heightening cybersecurity risks from AI models in China.

Top Stories

Google redefines its AI strategy with the launch of Gemini 3, facing the prospect of its search ad market share dipping below 50% for...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.