Connect with us

Hi, what are you looking for?

AI Tools

NIH-Funded Merlin Model Surpasses 81% Accuracy in Automating Complex CT Diagnostics

NIH-funded Merlin model achieves over 81% accuracy in automating complex CT diagnostics, predicting chronic disease onset years in advance.

A research team funded by the National Institutes of Health (NIH) has developed a groundbreaking machine learning model named Merlin, which has the potential to significantly enhance the insights derived from medical scans regarding disease. By analyzing 3D abdominal computed tomography (CT) scans, the model can perform a range of tasks—from identifying anatomical features to predicting disease onset years in advance. Remarkably, Merlin outperformed several dedicated automated tools in tasks they were specifically designed to handle.

The research team trained Merlin on a comprehensive dataset of patient CT scans linked to radiology reports and medical diagnosis codes obtained from the Stanford University School of Medicine. This collection is noted as the largest of its kind focused on abdominal CT data to date.

“Rich datasets like this are necessary to push the limits of what artificial intelligence models can accomplish in medicine. This work exemplifies how meticulously crafted training data can enable remarkable insights that significantly streamline workflows and assist in clinical decision-making,”

Bruce Tromberg, Ph.D., Director of NIH’s National Institute of Biomedical Imaging and Bioengineering (NIBIB)

CT scans are commonly utilized early in the diagnostic process, necessitating interpretation by radiologists, a procedure that can be both lengthy and complicated, especially considering the ongoing shortage of physicians in the United States. The introduction of Merlin could simplify this process by allowing a more direct transition from imaging to potential diagnosis. Co-first author Louis Blankemeier, Ph.D., who conducted this study as a graduate student at Stanford, emphasized this capability.

Merlin is categorized as a foundation model, trained on large-scale, unlabeled datasets that encompass diverse types of information. In the latest study, researchers evaluated Merlin across six broad categories of tasks, encompassing over 750 individual activities related to diagnostics, prognostics, and quality assessment.

To prepare Merlin for its extensive range of tasks, the team initially trained the model using their clinical data trove, which included more than 15,000 3D abdominal CT scans linked with radiology reports and nearly one million diagnostic codes. This foundational knowledge allowed Merlin to learn the connections between visual data and corresponding textual information.

Subsequently, the researchers tested Merlin on more than 50,000 CT scans from various hospitals to assess its ability to replicate human-generated conclusions. “Merlin tackled some tasks, such as predicting diagnosis codes, head-on, while other more complicated tasks, such as drafting radiology reports from scratch or identifying and outlining organs in a 3D space, called for additional training,” said co-first author Ashwin Kumar, also a graduate student at Stanford.

The team employed state-of-the-art models specifically designed for each task type as benchmarks for comparison. Merlin achieved an impressive average success rate of over 81% in predicting which of two scans was more likely associated with a particular diagnostic code across 692 different codes, outpacing several variants of other specialized models. For a subset of 102 codes, its performance climbed to 90%.

In another significant aspect of their research, the team tasked Merlin with predicting the onset of chronic diseases—such as diabetes, osteoporosis, and heart disease—in healthy patients, based solely on CT scan data. The model identified patients at higher risk of developing a specific disease within five years 75% of the time, compared to 68% for benchmarks.

The researchers further challenged Merlin by analyzing CT scans of the chest, an area not included in its training material. Remarkably, Merlin’s ability to discern generalizable features of disease enabled it to perform at levels comparable to, or better than, models specifically trained on chest scans.

Despite Merlin’s versatility, it exceeded or matched the performance of specialist models across all assessed tasks, a feat attributed to its architecture and the extensive training data that facilitated its understanding of complex 3D scans and the relationships between visual and textual data.

Looking ahead, the researchers are optimistic about leveraging their findings to secure regulatory approvals for simpler tasks, while also refining Merlin to tackle more complex challenges, such as report writing. Although powerful out of the box, the team encourages users to fine-tune the model with their own datasets for tailored applications.

“Our model and the data will provide the community a robust backbone to build upon,” said senior author Akshay Chaudhari, Ph.D., a professor of radiology and biomedical data science at Stanford University. “From here, the sky’s the limit.”

This groundbreaking research was supported by various NIH initiatives and grants, underscoring the promise of advanced AI in transforming medical imaging and diagnostics.

Source: National Institutes of Health (NIH)

Journal reference: Blankemeier, L., et al. (2026). Merlin: a computed tomography vision–language foundation model and dataset. Nature. DOI: 10.1038/s41586-026-10181-8. https://www.nature.com/articles/s41586-026-10181-8

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

AI startup Simile secures $100M in funding to develop a predictive model for human behavior, achieving 80% accuracy in anticipating earnings call questions.

AI Generative

Caltech researchers unveil Fun-DDPS, achieving an 11-fold reduction in error to 7.7% for carbon capture simulations using just 25% of typical data.

AI Education

Over 200 students urge schools to prioritize critical thinking over AI reliance, proposing a delay in access to tools like ChatGPT until ninth grade.

Top Stories

Stanford scientists unveil AI-designed virus Evo-Φ2147, which effectively targets E. coli strains, proving 25% more effective than wild variants.

AI Technology

Flapping Airplanes secures $180M from Google Ventures and Sequoia Capital for AI research despite not yet launching products, highlighting investor optimism in Neo Labs.

Top Stories

Google DeepMind's AlphaGenome analyzes DNA sequences up to 1 million bases, enhancing genomic insights and improving gene activity predictions by 14.7% over previous models.

Top Stories

Study reveals a 14% productivity boost for a Fortune 500 firm’s support agents using AI, but customer satisfaction plummets due to lack of empathy.

Top Stories

Stanford biologists create Evo-Φ2147, the first AI-designed virus targeting E. coli, paving the way for revolutionary genetic engineering advancements.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.