Hugging Face Launches ML Intern, Outperforming Claude Code in Scientific Reasoning

Hugging Face launches ML Intern, an open-source AI agent that surpasses Claude Code in scientific reasoning with a 32% GPQA score, offering $1,000 in GPU credits for early adopters

Staff

Published

3 hours ago

Hugging Face has launched ML Intern, an open-source AI agent designed to autonomously research, write, and execute machine learning code. Initial benchmarks indicate that it surpasses Anthropic’s Claude Code in scientific reasoning and OpenAI’s Codex in healthcare evaluations. The tool, developed by Hugging Face’s AI agents team, is being marketed as an automated counterpart to the post-training research loop utilized by the company’s machine learning researchers. It is available immediately as both a command-line interface (CLI) and a mobile and desktop web application.

Aksel Joonas Reedi, a team member focusing on AI agents at Hugging Face, announced the launch via LinkedIn. He detailed that ML Intern can pull academic papers from arXiv and hf.co/papers, navigate citation graphs, reformat datasets, and initiate training jobs on Hugging Face Jobs when local GPUs are unavailable. To incentivize early users, Hugging Face is providing $1,000 in GPU resources and credits from Anthropic.

Reedi further explained that ML Intern’s capabilities include training the best large language model (LLM) for scientific reasoning. Through citation searches, it identified relevant NVIDIA research, such as OpenScience and NemoTron-CrossThink, before executing 12 supervised fine-tuning passes on Qwen3-1.7B. This effort resulted in a score increase from 10% to 32% on GPQA in less than 10 hours, compared to Claude Code’s top score of 22.99%, he noted.

In a healthcare evaluation, ML Intern determined that existing datasets were inadequate, subsequently writing a script to generate 1,100 synthetic data points covering emergency, client, and multilingual communication. The agent then upsampled the data 50 times for training, achieving a performance improvement of 60% over Codex on HealthBench. For a competitive mathematics challenge, it created a complete GRPO training script and executed it on A100 GPUs via Hugging Face Spaces, running ablation tests until successful outcomes were achieved.

The project’s public documentation outlines that ML Intern operates an agentic loop capable of up to 300 iterations per task. It features a context manager to handle message history and auto compaction, a tool router linking Hugging Face documentation, datasets, jobs, and papers, as well as GitHub code search and sandboxed execution. The CLI can be installed via uv and accepts any inference provider model ID, with the default configuration directing users to Anthropic’s Claude models.

Reedi emphasized on LinkedIn that the agent is designed to closely reflect the work and thinking patterns of researchers, asserting that it “knows how data should look like and what good models feel like.”

In conjunction with the launch, Hugging Face has allocated $1,000 in GPU credits and Anthropic resources for the fastest early adopters of the tool. This initiative coincides with growing pressure on universities, bootcamps, and EdTech startups to provide students and staff with hands-on experience in model training without incurring high commercial cloud fees.

The pressing question now is how ML Intern’s autonomous capabilities will perform in less controlled, real-world educational datasets, where issues of data quality, consent, and licensing may arise. Hugging Face asserts that the agent is open source and built within its own ecosystem, allowing the community to publicly test its boundaries and limitations.

AI Generative

Kling AI Launches v2.5 with Native 4K Video Generation, Setting New Industry Standard

Kling AI launches v2.5, delivering native 4K video generation with 10-second clips, drastically lowering production costs for filmmakers and challenging Western competitors.

Staff20 minutes ago

DeepSeek Launches V4 API with 2M Token Context, Undercutting OpenAI and Anthropic Prices

DeepSeek's V4 API launches with a groundbreaking 2-million-token context window, challenging OpenAI and Anthropic while offering competitive pricing at $2.80 per million input tokens.

Staff45 minutes ago

AI Generative

OpenAI Launches GPT-5.5 Just Six Weeks After GPT-5.4, Boosting AI Efficiency and Accuracy

OpenAI unveils GPT-5.5 for paid subscribers, enhancing efficiency and accuracy with a 900 million weekly user base, just six weeks after GPT-5.4.

Staff4 hours ago

AI Technology

Sitharaman Meets Bank Leaders to Address AI Risks Post-Anthropic’s Mythos Concerns

Indian Finance Minister Nirmala Sitharaman met with bank leaders to address AI risks, following Anthropic's alarming claims about its Claude Mythos model's cybersecurity threats.

Staff12 hours ago

AI Cybersecurity

Anthropic’s Mythos Reveals AI’s Role in Accelerating Cyber Threats and Governance Needs

Anthropic's Mythos can autonomously exploit vulnerabilities and execute cyberattacks, raising urgent questions about AI governance and cybersecurity resilience.

Rachel Torres17 hours ago

OpenAI Prepares for IPO with $852B Valuation; 2 Ways to Invest Now

OpenAI, valued at $852 billion, eyes a 2026 IPO as revenue soars 225% to $13 billion, presenting investment opportunities via Ark Venture Fund and...

Staff18 hours ago

AI Technology

AI Experts Urge Regulation as OpenAI’s Sam Altman Proposes Legislative Framework

OpenAI's Sam Altman proposes a new AI regulatory framework as the White House blacklists Anthropic over failed contract negotiations, signaling rising tensions.

Staff19 hours ago

AI Cybersecurity

South Korea’s Intelligence Warns of AI-Powered Cyberattacks Using Anthropic’s Mythos

South Korea's intelligence warns that Anthropic's AI "Mythos" can autonomously execute cyberattacks, posing a severe risk to critical infrastructure by 2026.

Rachel Torres22 hours ago

AIPRESSA.COM

Top Stories

Hugging Face Launches ML Intern, Outperforming Claude Code in Scientific Reasoning

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Research

Amazon Awards 63 Research Grants to 41 Universities Across 8 Countries for AI Innovation

You May Also Like

AI Generative

Kling AI Launches v2.5 with Native 4K Video Generation, Setting New Industry Standard

Top Stories

DeepSeek Launches V4 API with 2M Token Context, Undercutting OpenAI and Anthropic Prices

AI Generative

OpenAI Launches GPT-5.5 Just Six Weeks After GPT-5.4, Boosting AI Efficiency and Accuracy

AI Technology

Sitharaman Meets Bank Leaders to Address AI Risks Post-Anthropic’s Mythos Concerns

AI Cybersecurity

Anthropic’s Mythos Reveals AI’s Role in Accelerating Cyber Threats and Governance Needs

Top Stories

OpenAI Prepares for IPO with $852B Valuation; 2 Ways to Invest Now

AI Technology

AI Experts Urge Regulation as OpenAI’s Sam Altman Proposes Legislative Framework

AI Cybersecurity

South Korea’s Intelligence Warns of AI-Powered Cyberattacks Using Anthropic’s Mythos