Connect with us

Hi, what are you looking for?

Top Stories

Prolific Study Ranks ChatGPT 8th, Behind Gemini and DeepSeek Models

Prolific’s new Humaine benchmark ranks OpenAI’s ChatGPT eighth, trailing behind top competitors like Google’s Gemini and DeepSeek.

Since the launch of OpenAI’s ChatGPT in late 2022, the chatbot has undeniably set the standard in the realm of generative AI, capturing a significant share of the market. However, a recent study by British firm Prolific has revealed that ChatGPT ranks eighth among leading AI models, trailing behind competitors such as Gemini, Grok, Claude, DeepSeek, and Mistral.

Prolific developed a new benchmark known as “Humaine” to evaluate AI performance based on human interaction standards rather than purely technical metrics. The company criticizes current evaluation methods, asserting that they often focus on data that is more relevant for researchers than for everyday users. “Current evaluation is heavily skewed towards metrics that are meaningful to researchers but opaque to everyday users,” the blog post emphasized, highlighting a disconnection between optimization and user experience. This sentiment is echoed in concerns about human-preference leaderboards, which can suffer from sample bias, often favoring tech-savvy audiences.

ChatGPT’s Ranking and Market Context

According to the Humaine study, the top ten AI models are as follows:

  1. Gemini 2.5 Pro (Google)
  2. DeepSeek v3 (DeepSeek)
  3. Magistral Medium (Mistral)
  4. Gemini 2.5 Flash (Google)
  5. DeepSeek R1 (DeepSeek)
  6. Gemini 2.0 Flash (Google)
  7. ChatGPT

This ranking is particularly surprising for OpenAI, given that its model is now positioned behind not only Google’s Gemini models but also offerings from DeepSeek and Mistral. The study was published in September, prior to the release of Google’s Gemini 3 Pro model and xAI’s Grok 4.1 models, which may affect future rankings.

Despite the ongoing advancements from competitors, the Gemini 2.5 Pro maintaining its place at the top comes as little surprise, having consistently led various performance benchmarks since its introduction. However, OpenAI’s omission from the top five raises questions about its current standing in an increasingly competitive landscape.

Addressing Evaluation Gaps

The Prolific study highlights the need for more rigorous and relevant methods of evaluating AI models. By implementing automated quality monitoring in their Humaine benchmark, Prolific aims to ensure that the feedback and interactions are genuinely reflective of user preferences rather than skewed by sample bias.

As AI technology continues to evolve, understanding how models perform under conditions that mimic real human interaction becomes essential. The insights gained from this study could potentially guide developers in refining their models to better meet user expectations, thereby bridging the gap between technical performance and user satisfaction.

In summary, while OpenAI’s ChatGPT remains a significant player in the generative AI space, its recent ranking as eighth among AI models indicates a shift in the competitive landscape. As companies like Google and DeepSeek continue to innovate, the importance of evaluating AI through the lens of user interaction will likely become increasingly vital for maintaining relevance in the market.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Cybersecurity

Anthropic's Mythos exposes thousands of critical vulnerabilities in major systems, prompting $100M in defensive action from tech giants and U.S. banks.

AI Government

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

AI Research

OpenAI's o1 model achieves 81.6% diagnostic accuracy in emergency situations, surpassing human doctors and signaling a major shift in medical practice.

AI Marketing

BusySeed unveils Rankxa, a tool tracking brand visibility across AI-generated responses, revealing 90% of brands lack meaningful presence in this new landscape.

AI Generative

Google is set to unveil its new video-generation tool, Omni, at I/O 2026, potentially integrating Gemini's capabilities and enhancing competition against ByteDance's Seedance 2.0.

AI Technology

A1 Public Relations helps entertainment brands enhance AI visibility in 2026 by integrating structured content and fresh, authoritative media, ensuring they are recognized by...

AI Generative

OpenAI unveils GPT Image 2, achieving a record 242-point lead over competitors, transforming the AI image generation landscape with native reasoning capabilities.

AI Government

Anthropic accuses Moonshot AI of 3.4M unauthorized exchanges with its Claude chatbot, prompting a global U.S. State Department campaign against IP theft.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.