Since its public release in late 2022, OpenAI’s ChatGPT has been at the forefront of the generative AI chatbot phenomenon. It has maintained a significant portion of the market share despite the emergence of formidable competitors like Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral. However, a recent study conducted by the British company Prolific has cast a surprising light on its standing among these models.
Prolific’s research ranks ChatGPT in the eighth position on its newly developed benchmark called “Humaine.” This ranking places it behind several models, including those from Gemini, Grok, DeepSeek, and even one from Mistral, a French AI company. The Humaine benchmark is designed to evaluate AI performance through the lens of natural human interaction, a significant shift from traditional metrics that often prioritize specialized datasets and complex reasoning tasks that may not resonate with everyday users.
Prolific’s blog post highlights a critical gap in current evaluation methods, stating that the existing metrics tend to favor researchers while failing to address what users genuinely value. The emphasis on esoteric reasoning tasks creates a disconnect between optimization and actual user experience. Moreover, the study points out the potential shortcomings of human-preference leaderboards, suggesting they may be skewed towards tech-savvy participants and thus fail to represent broader user preferences adequately.
New Ranking Insights
According to the findings from the Humaine study, the top ten AI models are as follows:
- 1. Gemini 2.5 Pro (Google)
- 2. DeepSeek v3 (DeepSeek)
- 3. Magistral Medium (Mistral)
- 6. Gemini 2.5 Flash (Google)
- 7. DeepSeek R1 (DeepSeek)
- 10. Gemini 2.0 Flash (Google)
This ranking is particularly notable given the context of its publication in September, a time frame when Google had not yet launched its Gemini 3 Pro model, and xAI had yet to release its Grok 4.1 and Grok 4.1 Thinking models. The sustained dominance of Gemini 2.5 Pro across various leaderboards since its introduction has set high expectations in the industry.
ChatGPT’s placement outside the top five—particularly behind models from competitors like DeepSeek, Grok, and Mistral—marks a significant shift, if the results hold true. While the researchers have not provided explicit reasons for ChatGPT’s lower ranking, they emphasized that Gemini 2.5 Pro consistently emerged as the top model for the “Overall Winner” metric.
This new landscape presents a challenge for OpenAI, as the results call into question the model’s perceived capability relative to its rivals. As the competition heats up in the generative AI field, the implications for developers and users alike are profound. Companies are increasingly required to align their models not only with technical excellence but also with user-centric metrics that genuinely reflect real-world interactions.
The evolution of benchmarks like Humaine suggests a potential shift in how AI performance is evaluated in the future. By prioritizing user engagement and relevance, the AI community may be on the cusp of re-defining its standards, ensuring that models like ChatGPT can adapt and respond to a changing landscape where user experience takes precedence.
As the AI ecosystem continues to evolve, the challenge for established players like OpenAI will be to maintain their relevance in the face of innovative metrics and competitive pressures. The findings from Prolific’s Humaine benchmark serve as a reminder that in the fast-paced world of AI, staying ahead requires not just technological prowess, but a deep understanding of user needs and preferences.
Character.AI Faces Safety Backlash as Experts Warn of Risks for Teen Users
Erdogan Advocates for Global AI Policies and Resource Security at G20 Summit
Prolific Study Ranks ChatGPT 8th, Behind Gemini and DeepSeek Models
UAE Launches $1 Billion AI Initiative to Transform Infrastructure and Services in Africa



















































