Connect with us

Hi, what are you looking for?

AI Research

Human Researchers Outperform AI in Medical Systematic Reviews, Study Finds

Human researchers outperform large language models in systematic literature reviews, with LLMs achieving only 93% accuracy in data extraction but failing to produce satisfactory final manuscripts.

A recent study published in the journal Scientific Reports highlights the continuing superiority of human researchers over large language models (LLMs) in conducting systematic literature reviews. The findings emphasize that while LLMs have shown impressive capabilities in various applications, they are best utilized as supervised support tools rather than as independent authors for critical research tasks.

Large language models, which employ deep learning techniques to generate human-like text, have gained significant traction since the debut of OpenAI’s ChatGPT in 2022. These models are now frequently employed in sectors such as healthcare and education for their ability to interpret and generate text, with applications ranging from language translation to medical report drafting. Despite their rapid adoption, the potential risks and challenges associated with their integration into scientific research demand careful consideration.

The study aimed to assess whether LLMs could outperform human researchers in systematic literature reviews—a fundamental process in evidence-based medicine. Researchers compared the outputs of six different LLMs against an original systematic review conducted by human experts. The evaluation included tasks such as literature searches, article screening and selection, data extraction, and the final drafting of the review, with each task repeated to monitor improvements over time.

Among the LLMs tested, Gemini excelled in the initial literature search and selection phase, successfully identifying 13 out of 18 articles that human researchers included in their review. However, the study revealed significant limitations in the LLMs’ performance across other tasks, particularly in data summarization and drafting the final manuscript. These shortcomings are likely tied to the restricted access that many LLMs have to scientific article databases and the limited scope of their training datasets, which often lack sufficient original research articles.

Despite challenges in the first task, LLMs demonstrated a faster extraction rate of relevant articles compared to human researchers, suggesting their potential utility for preliminary literature screening. During the data extraction and analysis phase, the model DeepSeek achieved an overall accuracy rate of 93%, but also required complex prompts and multiple uploads to yield results—a clear indicator of inefficiency relative to human efforts.

When it came to drafting the final manuscript, none of the LLMs succeeded in producing fully satisfactory content. The generated articles often fell short in adhering to the structured format required for systematic reviews, producing outputs that, while well-organized and using correct scientific language, lacked the depth and nuance expected from expert analysis. This could mislead readers unfamiliar with the rigorous standards demanded in systematic reviews and meta-analyses.

Overall, the study concludes that modern LLMs are not yet capable of independently generating systematic reviews in the medical domain without the aid of well-designed prompts. However, the incremental improvements observed between evaluation rounds suggest that, under appropriate supervision, LLMs could serve as valuable adjuncts in certain aspects of the review process. Recent evidence supports the notion that guided prompting strategies can enhance LLM performance in specific review tasks.

The scope of this study, which focused solely on a single systematic review in the medical field, may limit the generalizability of the findings. Further research is needed to evaluate multiple systematic reviews across various biomedical and non-biomedical disciplines to enrich the robustness and external validity of the results. As the integration of AI tools continues to evolve, understanding their strengths and limitations will be pivotal for advancing research practices in an increasingly technology-driven landscape.

For more details, refer to the study by Sollini et al., published in Scientific Reports, DOI: 10.1038/s41598-025-28993-5.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Business

Red Hat advances enterprise AI with Small Language Models that achieve over 98% validity in structured tasks, prioritizing reliability and data sovereignty.

AI Government

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

AI Research

OpenAI's o1 model achieves 81.6% diagnostic accuracy in emergency situations, surpassing human doctors and signaling a major shift in medical practice.

AI Marketing

BusySeed unveils Rankxa, a tool tracking brand visibility across AI-generated responses, revealing 90% of brands lack meaningful presence in this new landscape.

AI Technology

A1 Public Relations helps entertainment brands enhance AI visibility in 2026 by integrating structured content and fresh, authoritative media, ensuring they are recognized by...

AI Generative

OpenAI unveils GPT Image 2, achieving a record 242-point lead over competitors, transforming the AI image generation landscape with native reasoning capabilities.

AI Finance

More than 55% of Americans now turn to AI tools for financial advice, risking personal data exposure despite rising privacy concerns.

AI Technology

Apple CEO Tim Cook warns of several-month supply shortages for the Mac mini and Mac Studio as demand surges, pushing Mac revenue to $8.4...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.