Connect with us

Hi, what are you looking for?

Top Stories

UGI Leaderboard Launches to Rank AI Models Based on Censorship Levels and Response Quality

Hugging Face launches the UGI Leaderboard, ranking AI models by censorship levels, with Grok-4-0709 scoring 68.75 for sensitive topic engagement.

In a significant development for artificial intelligence evaluation, the UGI Leaderboard has emerged on Hugging Face’s community space, enabling users to compare AI models based on distinct criteria. Launched on January 16, 2026, the leaderboard ranks models according to their responses to sensitive topics, providing insight into the levels of censorship and willingness to engage in sensitive discussions.

The UGI Leaderboard, created by a user identified as DontPlanToEnd, assesses models using a scoring system that emphasizes their ability to address “danger” topics, controversial entertainment knowledge, and sensitive socio-political issues. This novel benchmark, which stands for Uncensored General Intelligence, evaluates the extent to which AI responses are constrained by censorship. The model’s ability to respond meaningfully to questions—especially those typically deemed risky—plays a crucial role in its scoring.

Models are evaluated on two primary criteria: the UGI score and a measure termed Willingness/10. The UGI score reflects how well an AI can engage with sensitive content without refusing to answer. For example, the top-ranked model, Grok-4-0709, achieved a UGI score of 68.75, while DeepSeek-V3.2-Speciale followed closely with a score of 67.93. Interestingly, Grok-4-0709 recorded a low response rate of 6.5 for dangerous content questions, contrasting with a higher rate of 7.5 for adult or controversial topics.

In comparison, DeepSeek-V3.2-Speciale scored 8.8 on dangerous content inquiries, demonstrating a different approach to sensitive discussions. The willingness score, indicative of a model’s likelihood to reject sensitive instructions, varies across models. Grok-4-0709 reported a score of 6, while DeepSeek-V3.2-Speciale had a medium willingness score of 4.8. In contrast, Mistral-Large-Instruct-2411 excelled with a W/10 score of 7.5, signaling a greater propensity to engage with challenging questions.

Beyond censorship and response willingness, the UGI Leaderboard also evaluates models on their general knowledge and reasoning capabilities, referred to as Intelligence. This includes various domains such as textbook knowledge and pop culture, allowing users to gauge each model’s proficiency in diverse areas. The leaderboard also includes a writing evaluation, measuring the stylistic range of models from safe for work (SFW) to not safe for work (NSFW), and assesses their political orientation to understand possible ideological biases.

While the UGI Leaderboard serves as a valuable tool for those seeking to evaluate AI models, it is essential to note that the scores are derived from a volunteer-driven benchmarking process. Therefore, they should be interpreted cautiously. Users looking for AI systems that are less constrained by censorship can utilize the leaderboard to identify models that facilitate more open discussions. However, a high score does not guarantee problematic responses; rather, it suggests a tendency for the model to engage rather than refuse to discuss sensitive topics.

This initiative reflects a growing interest in the AI community to foster transparency and encourage discussions around the often contentious nature of AI responses. As AI continues to evolve, platforms like the UGI Leaderboard may play a pivotal role in shaping the landscape of AI interactions, providing users with the tools necessary to navigate the complexities of AI communication effectively.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Nvidia enters South Korea's AI market by launching 7 million Korean-language personas and the multimodal Nemotron3 Nano, aiming to establish market dominance.

Top Stories

Multiverse Computing unveils the LittleLamb AI model family on Hugging Face, reducing model size by 50% while enhancing performance for edge and mobile applications.

Top Stories

DeepSeek's V4-Pro eclipses GPT-5 and Claude in key benchmarks, achieving a Codeforces rating of 3,206 while undercutting OpenAI's costs by 89% per million tokens.

Top Stories

Hugging Face launches ML Intern, an open-source AI agent that surpasses Claude Code in scientific reasoning with a 32% GPQA score, offering $1,000 in...

Top Stories

Anonymous developer RizenML claims to have trained a 235M parameter language model on a single Nvidia RTX 5080 in 14 days, challenging traditional AI...

Top Stories

Threat actors exploit the Marimo Python notebook vulnerability (CVE-2026-39987) to deploy NKAbuse malware via Hugging Face, launching 662 attacks in just three days.

Top Stories

Hugging Face's HoloTab Chrome extension enables AI models to mimic human behavior in web applications, enhancing automation without site-specific integrations.

Top Stories

MiniMax launches the free M2.7 AI model with 229 billion parameters, outperforming Gemini 3.1 Pro in key benchmarks and enhancing multi-agent capabilities.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.