Connect with us

Hi, what are you looking for?

Top Stories

Study Reveals ChatGPT Drops to 8th Place, Surpassed by 7 Competing AI Models

OpenAI’s ChatGPT drops to eighth place in Prolific’s Humaine benchmark, surpassed by seven competitors including Google’s Gemini 2.5 Pro.

Since its public release in late 2022, OpenAI’s ChatGPT has been at the forefront of the generative AI chatbot phenomenon. It has maintained a significant portion of the market share despite the emergence of formidable competitors like Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral. However, a recent study conducted by the British company Prolific has cast a surprising light on its standing among these models.

Prolific’s research ranks ChatGPT in the eighth position on its newly developed benchmark called “Humaine.” This ranking places it behind several models, including those from Gemini, Grok, DeepSeek, and even one from Mistral, a French AI company. The Humaine benchmark is designed to evaluate AI performance through the lens of natural human interaction, a significant shift from traditional metrics that often prioritize specialized datasets and complex reasoning tasks that may not resonate with everyday users.

Prolific’s blog post highlights a critical gap in current evaluation methods, stating that the existing metrics tend to favor researchers while failing to address what users genuinely value. The emphasis on esoteric reasoning tasks creates a disconnect between optimization and actual user experience. Moreover, the study points out the potential shortcomings of human-preference leaderboards, suggesting they may be skewed towards tech-savvy participants and thus fail to represent broader user preferences adequately.

New Ranking Insights

According to the findings from the Humaine study, the top ten AI models are as follows:

  • 1. Gemini 2.5 Pro (Google)
  • 2. DeepSeek v3 (DeepSeek)
  • 3. Magistral Medium (Mistral)
  • 6. Gemini 2.5 Flash (Google)
  • 7. DeepSeek R1 (DeepSeek)
  • 10. Gemini 2.0 Flash (Google)

This ranking is particularly notable given the context of its publication in September, a time frame when Google had not yet launched its Gemini 3 Pro model, and xAI had yet to release its Grok 4.1 and Grok 4.1 Thinking models. The sustained dominance of Gemini 2.5 Pro across various leaderboards since its introduction has set high expectations in the industry.

ChatGPT’s placement outside the top five—particularly behind models from competitors like DeepSeek, Grok, and Mistral—marks a significant shift, if the results hold true. While the researchers have not provided explicit reasons for ChatGPT’s lower ranking, they emphasized that Gemini 2.5 Pro consistently emerged as the top model for the “Overall Winner” metric.

This new landscape presents a challenge for OpenAI, as the results call into question the model’s perceived capability relative to its rivals. As the competition heats up in the generative AI field, the implications for developers and users alike are profound. Companies are increasingly required to align their models not only with technical excellence but also with user-centric metrics that genuinely reflect real-world interactions.

The evolution of benchmarks like Humaine suggests a potential shift in how AI performance is evaluated in the future. By prioritizing user engagement and relevance, the AI community may be on the cusp of re-defining its standards, ensuring that models like ChatGPT can adapt and respond to a changing landscape where user experience takes precedence.

As the AI ecosystem continues to evolve, the challenge for established players like OpenAI will be to maintain their relevance in the face of innovative metrics and competitive pressures. The findings from Prolific’s Humaine benchmark serve as a reminder that in the fast-paced world of AI, staying ahead requires not just technological prowess, but a deep understanding of user needs and preferences.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Generative

X restricts its Grok image generation feature to paid subscribers globally, amid government scrutiny over non-consensual and obscene content violations.

Top Stories

DeepSeek's V4 model, launching February 17, aims to surpass Claude and GPT in coding performance, leveraging a $6 million development cost and innovative mHC...

Top Stories

Nvidia, Broadcom, and Amazon are set to lead the AI market's explosive growth, with Nvidia's EPS projected to soar 45% and Broadcom's AI revenue...

Top Stories

Authorities in France, India, and Malaysia investigate Grok after 53% of over 20,000 generated images featured women in explicit content, raising global concerns.

AI Business

As enterprises double down on AI investments, OpenAI faces intensified competition from Google's Gemini and Microsoft's Copilot, threatening its market dominance.

Top Stories

Ex-Grok executive Robert Hayes retains a rare waiver for his 22,100 xAI shares amid ongoing scrutiny of the controversial Grok chatbot's outputs.

AI Generative

Grok restricts image generation features to paying subscribers amid global backlash over its role in creating unlawful deepfakes, prompting regulatory scrutiny.

Top Stories

Anthropic seeks $10 billion in funding to boost its valuation to $350 billion amid rising concerns of an AI bubble, as competition with OpenAI...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.