Connect with us

Hi, what are you looking for?

Top Stories

Hugging Face Launches ML Intern, Outperforming Claude Code in Scientific Reasoning

Hugging Face launches ML Intern, an open-source AI agent that surpasses Claude Code in scientific reasoning with a 32% GPQA score, offering $1,000 in GPU credits for early adopters

Hugging Face has launched ML Intern, an open-source AI agent designed to autonomously research, write, and execute machine learning code. Initial benchmarks indicate that it surpasses Anthropic’s Claude Code in scientific reasoning and OpenAI’s Codex in healthcare evaluations. The tool, developed by Hugging Face’s AI agents team, is being marketed as an automated counterpart to the post-training research loop utilized by the company’s machine learning researchers. It is available immediately as both a command-line interface (CLI) and a mobile and desktop web application.

Aksel Joonas Reedi, a team member focusing on AI agents at Hugging Face, announced the launch via LinkedIn. He detailed that ML Intern can pull academic papers from arXiv and hf.co/papers, navigate citation graphs, reformat datasets, and initiate training jobs on Hugging Face Jobs when local GPUs are unavailable. To incentivize early users, Hugging Face is providing $1,000 in GPU resources and credits from Anthropic.

Reedi further explained that ML Intern’s capabilities include training the best large language model (LLM) for scientific reasoning. Through citation searches, it identified relevant NVIDIA research, such as OpenScience and NemoTron-CrossThink, before executing 12 supervised fine-tuning passes on Qwen3-1.7B. This effort resulted in a score increase from 10% to 32% on GPQA in less than 10 hours, compared to Claude Code’s top score of 22.99%, he noted.

In a healthcare evaluation, ML Intern determined that existing datasets were inadequate, subsequently writing a script to generate 1,100 synthetic data points covering emergency, client, and multilingual communication. The agent then upsampled the data 50 times for training, achieving a performance improvement of 60% over Codex on HealthBench. For a competitive mathematics challenge, it created a complete GRPO training script and executed it on A100 GPUs via Hugging Face Spaces, running ablation tests until successful outcomes were achieved.

The project’s public documentation outlines that ML Intern operates an agentic loop capable of up to 300 iterations per task. It features a context manager to handle message history and auto compaction, a tool router linking Hugging Face documentation, datasets, jobs, and papers, as well as GitHub code search and sandboxed execution. The CLI can be installed via uv and accepts any inference provider model ID, with the default configuration directing users to Anthropic’s Claude models.

Reedi emphasized on LinkedIn that the agent is designed to closely reflect the work and thinking patterns of researchers, asserting that it “knows how data should look like and what good models feel like.”

In conjunction with the launch, Hugging Face has allocated $1,000 in GPU credits and Anthropic resources for the fastest early adopters of the tool. This initiative coincides with growing pressure on universities, bootcamps, and EdTech startups to provide students and staff with hands-on experience in model training without incurring high commercial cloud fees.

The pressing question now is how ML Intern’s autonomous capabilities will perform in less controlled, real-world educational datasets, where issues of data quality, consent, and licensing may arise. Hugging Face asserts that the agent is open source and built within its own ecosystem, allowing the community to publicly test its boundaries and limitations.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Cybersecurity

Anthropic's Mythos exposes thousands of critical vulnerabilities in major systems, prompting $100M in defensive action from tech giants and U.S. banks.

AI Government

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

AI Research

OpenAI's o1 model achieves 81.6% diagnostic accuracy in emergency situations, surpassing human doctors and signaling a major shift in medical practice.

AI Generative

OpenAI unveils GPT Image 2, achieving a record 242-point lead over competitors, transforming the AI image generation landscape with native reasoning capabilities.

AI Business

Nvidia CEO Jensen Huang urges industry leaders to avoid alarmist claims about AI's future, citing concerns over inaccurate predictions like a 50% job displacement...

AI Government

Anthropic accuses Moonshot AI of 3.4M unauthorized exchanges with its Claude chatbot, prompting a global U.S. State Department campaign against IP theft.

AI Research

Mass General Brigham unveils APOLLO, a transformative AI model trained on 25 billion medical events, achieving a 0.92 AUROC for predicting schizophrenia risks.

AI Technology

Apple CEO Tim Cook warns of several-month supply shortages for the Mac mini and Mac Studio as demand surges, pushing Mac revenue to $8.4...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.