AI Tools

NIH-Funded Merlin Model Surpasses 81% Accuracy in Automating Complex CT Diagnostics

NIH-funded Merlin model achieves over 81% accuracy in automating complex CT diagnostics, predicting chronic disease onset years in advance.

Staff

Published

19 hours ago

A research team funded by the National Institutes of Health (NIH) has developed a groundbreaking machine learning model named Merlin, which has the potential to significantly enhance the insights derived from medical scans regarding disease. By analyzing 3D abdominal computed tomography (CT) scans, the model can perform a range of tasks—from identifying anatomical features to predicting disease onset years in advance. Remarkably, Merlin outperformed several dedicated automated tools in tasks they were specifically designed to handle.

The research team trained Merlin on a comprehensive dataset of patient CT scans linked to radiology reports and medical diagnosis codes obtained from the Stanford University School of Medicine. This collection is noted as the largest of its kind focused on abdominal CT data to date.

“Rich datasets like this are necessary to push the limits of what artificial intelligence models can accomplish in medicine. This work exemplifies how meticulously crafted training data can enable remarkable insights that significantly streamline workflows and assist in clinical decision-making,”

Bruce Tromberg, Ph.D., Director of NIH’s National Institute of Biomedical Imaging and Bioengineering (NIBIB)

CT scans are commonly utilized early in the diagnostic process, necessitating interpretation by radiologists, a procedure that can be both lengthy and complicated, especially considering the ongoing shortage of physicians in the United States. The introduction of Merlin could simplify this process by allowing a more direct transition from imaging to potential diagnosis. Co-first author Louis Blankemeier, Ph.D., who conducted this study as a graduate student at Stanford, emphasized this capability.

Merlin is categorized as a foundation model, trained on large-scale, unlabeled datasets that encompass diverse types of information. In the latest study, researchers evaluated Merlin across six broad categories of tasks, encompassing over 750 individual activities related to diagnostics, prognostics, and quality assessment.

To prepare Merlin for its extensive range of tasks, the team initially trained the model using their clinical data trove, which included more than 15,000 3D abdominal CT scans linked with radiology reports and nearly one million diagnostic codes. This foundational knowledge allowed Merlin to learn the connections between visual data and corresponding textual information.

Subsequently, the researchers tested Merlin on more than 50,000 CT scans from various hospitals to assess its ability to replicate human-generated conclusions. “Merlin tackled some tasks, such as predicting diagnosis codes, head-on, while other more complicated tasks, such as drafting radiology reports from scratch or identifying and outlining organs in a 3D space, called for additional training,” said co-first author Ashwin Kumar, also a graduate student at Stanford.

The team employed state-of-the-art models specifically designed for each task type as benchmarks for comparison. Merlin achieved an impressive average success rate of over 81% in predicting which of two scans was more likely associated with a particular diagnostic code across 692 different codes, outpacing several variants of other specialized models. For a subset of 102 codes, its performance climbed to 90%.

In another significant aspect of their research, the team tasked Merlin with predicting the onset of chronic diseases—such as diabetes, osteoporosis, and heart disease—in healthy patients, based solely on CT scan data. The model identified patients at higher risk of developing a specific disease within five years 75% of the time, compared to 68% for benchmarks.

The researchers further challenged Merlin by analyzing CT scans of the chest, an area not included in its training material. Remarkably, Merlin’s ability to discern generalizable features of disease enabled it to perform at levels comparable to, or better than, models specifically trained on chest scans.

Despite Merlin’s versatility, it exceeded or matched the performance of specialist models across all assessed tasks, a feat attributed to its architecture and the extensive training data that facilitated its understanding of complex 3D scans and the relationships between visual and textual data.

Looking ahead, the researchers are optimistic about leveraging their findings to secure regulatory approvals for simpler tasks, while also refining Merlin to tackle more complex challenges, such as report writing. Although powerful out of the box, the team encourages users to fine-tune the model with their own datasets for tailored applications.

“Our model and the data will provide the community a robust backbone to build upon,” said senior author Akshay Chaudhari, Ph.D., a professor of radiology and biomedical data science at Stanford University. “From here, the sky’s the limit.”

This groundbreaking research was supported by various NIH initiatives and grants, underscoring the promise of advanced AI in transforming medical imaging and diagnostics.

Source: National Institutes of Health (NIH)

Journal reference: Blankemeier, L., et al. (2026). Merlin: a computed tomography vision–language foundation model and dataset. Nature. DOI: 10.1038/s41586-026-10181-8. https://www.nature.com/articles/s41586-026-10181-8

Stanford Spinout Simile Raises $100M to Build AI for Predicting Human Behavior in Earnings Calls

AI startup Simile secures $100M in funding to develop a predictive model for human behavior, achieving 80% accuracy in anticipating earnings call questions.

Staff14 February, 2026

AI Generative

Caltech Researchers Achieve 11x Improvement in Carbon Capture Simulations with Fun-DDPS

Caltech researchers unveil Fun-DDPS, achieving an 11-fold reduction in error to 7.7% for carbon capture simulations using just 25% of typical data.

Staff13 February, 2026

AI Education

200 Students Reach Consensus on AI in Schools: Emphasize Critical Thinking Over Tools

Over 200 students urge schools to prioritize critical thinking over AI reliance, proposing a delay in access to tools like ChatGPT until ninth grade.

David Park12 February, 2026

AI-Designed Virus Breakthrough: Evo-Φ2147 Targets E. coli, Sparks Biosecurity Debate

Stanford scientists unveil AI-designed virus Evo-Φ2147, which effectively targets E. coli strains, proving 25% more effective than wild variants.

Staff5 February, 2026

AIPRESSA.COM

AI Tools

NIH-Funded Merlin Model Surpasses 81% Accuracy in Automating Complex CT Diagnostics

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

Top Stories

Stanford Spinout Simile Raises $100M to Build AI for Predicting Human Behavior in Earnings Calls

AI Generative

Caltech Researchers Achieve 11x Improvement in Carbon Capture Simulations with Fun-DDPS

AI Education

200 Students Reach Consensus on AI in Schools: Emphasize Critical Thinking Over Tools

Top Stories

AI-Designed Virus Breakthrough: Evo-Φ2147 Targets E. coli, Sparks Biosecurity Debate

AI Technology

Neo Labs Attracts $180M for AI Research Despite Lack of Market-Ready Products

Top Stories

Google DeepMind’s AlphaGenome Analyzes 1M DNA Bases, Revolutionizing Genetic Insights

Top Stories

AI in Customer Support: Study Shows 14% Speed Boost, But Quality Takes a Hit

Top Stories

AI Designs First Complete Genetic Blueprint for Virus Targeting Superbugs, Revolutionizing Synthetic Biology