A research team funded by the National Institutes of Health (NIH) has developed a groundbreaking machine learning model named Merlin, which has the potential to significantly enhance the insights derived from medical scans regarding disease. By analyzing 3D abdominal computed tomography (CT) scans, the model can perform a range of tasks—from identifying anatomical features to predicting disease onset years in advance. Remarkably, Merlin outperformed several dedicated automated tools in tasks they were specifically designed to handle.
The research team trained Merlin on a comprehensive dataset of patient CT scans linked to radiology reports and medical diagnosis codes obtained from the Stanford University School of Medicine. This collection is noted as the largest of its kind focused on abdominal CT data to date.
“Rich datasets like this are necessary to push the limits of what artificial intelligence models can accomplish in medicine. This work exemplifies how meticulously crafted training data can enable remarkable insights that significantly streamline workflows and assist in clinical decision-making,”
Bruce Tromberg, Ph.D., Director of NIH’s National Institute of Biomedical Imaging and Bioengineering (NIBIB)
CT scans are commonly utilized early in the diagnostic process, necessitating interpretation by radiologists, a procedure that can be both lengthy and complicated, especially considering the ongoing shortage of physicians in the United States. The introduction of Merlin could simplify this process by allowing a more direct transition from imaging to potential diagnosis. Co-first author Louis Blankemeier, Ph.D., who conducted this study as a graduate student at Stanford, emphasized this capability.
Merlin is categorized as a foundation model, trained on large-scale, unlabeled datasets that encompass diverse types of information. In the latest study, researchers evaluated Merlin across six broad categories of tasks, encompassing over 750 individual activities related to diagnostics, prognostics, and quality assessment.
To prepare Merlin for its extensive range of tasks, the team initially trained the model using their clinical data trove, which included more than 15,000 3D abdominal CT scans linked with radiology reports and nearly one million diagnostic codes. This foundational knowledge allowed Merlin to learn the connections between visual data and corresponding textual information.
Subsequently, the researchers tested Merlin on more than 50,000 CT scans from various hospitals to assess its ability to replicate human-generated conclusions. “Merlin tackled some tasks, such as predicting diagnosis codes, head-on, while other more complicated tasks, such as drafting radiology reports from scratch or identifying and outlining organs in a 3D space, called for additional training,” said co-first author Ashwin Kumar, also a graduate student at Stanford.
The team employed state-of-the-art models specifically designed for each task type as benchmarks for comparison. Merlin achieved an impressive average success rate of over 81% in predicting which of two scans was more likely associated with a particular diagnostic code across 692 different codes, outpacing several variants of other specialized models. For a subset of 102 codes, its performance climbed to 90%.
In another significant aspect of their research, the team tasked Merlin with predicting the onset of chronic diseases—such as diabetes, osteoporosis, and heart disease—in healthy patients, based solely on CT scan data. The model identified patients at higher risk of developing a specific disease within five years 75% of the time, compared to 68% for benchmarks.
The researchers further challenged Merlin by analyzing CT scans of the chest, an area not included in its training material. Remarkably, Merlin’s ability to discern generalizable features of disease enabled it to perform at levels comparable to, or better than, models specifically trained on chest scans.
Despite Merlin’s versatility, it exceeded or matched the performance of specialist models across all assessed tasks, a feat attributed to its architecture and the extensive training data that facilitated its understanding of complex 3D scans and the relationships between visual and textual data.
Looking ahead, the researchers are optimistic about leveraging their findings to secure regulatory approvals for simpler tasks, while also refining Merlin to tackle more complex challenges, such as report writing. Although powerful out of the box, the team encourages users to fine-tune the model with their own datasets for tailored applications.
“Our model and the data will provide the community a robust backbone to build upon,” said senior author Akshay Chaudhari, Ph.D., a professor of radiology and biomedical data science at Stanford University. “From here, the sky’s the limit.”
This groundbreaking research was supported by various NIH initiatives and grants, underscoring the promise of advanced AI in transforming medical imaging and diagnostics.
Source: National Institutes of Health (NIH)
Journal reference: Blankemeier, L., et al. (2026). Merlin: a computed tomography vision–language foundation model and dataset. Nature. DOI: 10.1038/s41586-026-10181-8. https://www.nature.com/articles/s41586-026-10181-8
See also
AI Study Reveals LLMs Can Unmask Anonymous Social Media Users with High Accuracy
$60 Million Initiative Launched to Evaluate AI Decision Support Tools in Healthcare
AI Transforms Health Care Workflows, Elevating Patient Care and Outcomes
Tamil Nadu’s Anbil Mahesh Seeks Exemption for In-Service Teachers from TET Requirements
Top AI Note-Taking Apps of 2026: Boost Productivity with 95% Accurate Transcriptions





















































