Connect with us

Hi, what are you looking for?

Top Stories

Google DeepMind’s AI Co-Clinician Surpasses GPT-5.4 in Blind Doctor Tests

Google DeepMind’s AI co-clinician outperformed GPT-5.4 in doctor tests, achieving 67 preferences in primary care queries and a remarkable 95% quality score in open-ended medication questions.

Google DeepMind is advancing healthcare technology with its development of an “AI co-clinician” designed to assist doctors in patient care. Initial simulation studies indicate promising results, though the AI system has not yet matched the performance of seasoned physicians. Additionally, the research highlights limitations of ChatGPT’s voice mode for serious applications, particularly in medical consultations.

The AI co-clinician operates within a framework termed “triadic care,” wherein AI agents support patients under the supervision of doctors, maintaining clinical authority. This collaborative approach aims to enhance patient treatment while ensuring oversight remains in the hands of qualified medical professionals.

To assess the system from a clinician’s viewpoint, researchers collaborated with academic physicians to implement the NOHARM framework, which evaluates two categories of mistakes: errors of commission and errors of omission. In a blind comparison involving 98 primary care queries, doctors favored the AI co-clinician’s responses over leading evidence synthesis tools. The AI co-clinician achieved 67 preferences compared to an existing clinical AI system’s 26, and it outperformed GPT-5.4-thinking-with-search by a score of 63 to 30. Notably, the AI co-clinician made a critical error in only one of the 98 cases evaluated.

The lead was particularly pronounced in medication inquiries. The RxQA benchmark, which includes 600 questions on active ingredients, interactions, and dosages sourced from national drug directories and vetted by licensed pharmacists, posed challenges for primary care physicians. With reference materials, doctors answered 61.3 percent correctly, but this dropped to 48.3 percent without external assistance. The AI co-clinician excelled with a score of 73.3 percent, surpassing GPT-5.4-thinking-with-search, which scored 72.7 percent. The performance gap increased when questions were posed in an open-ended format, typical of real-world searches; here, the AI co-clinician achieved a remarkable quality score of 95.0 percent, compared to 90.9 percent for OpenAI’s model.

In addition to text-based support, Google DeepMind is exploring the AI co-clinician’s capabilities in telemedicine through real-time audio and video interactions. Partnering with physicians at Harvard and Stanford, researchers conducted a randomized simulation study involving 20 synthetic clinical scenarios, 10 doctors acting as patient representatives, culminating in 120 hypothetical telemedicine consultations. The AI co-clinician demonstrated abilities that extend beyond text-only systems, such as correcting a patient’s inhaler technique and guiding patients through shoulder exams to identify rotator cuff injuries.

In patient-facing dialogues, the AI co-clinician employs a dual-agent configuration: a “Planner” module oversees the conversation to ensure the “Talker” agent adheres to safe clinical practices. When utilized by doctors, the system emphasizes solid clinical evidence and conducts verification and citation checks during information retrieval.

Despite these advancements, the study revealed that experienced physicians consistently outperformed the AI co-clinician across 140 assessed aspects of consultation quality, including triage, history taking, clinical reasoning, communication and counseling, treatment steps, recognizing warning signs, and conducting physical exams. The findings suggest that while the AI co-clinician matched or exceeded primary care physicians in 68 of the evaluated areas, it lagged behind seasoned doctors, especially in identifying critical warning signs and executing thorough physical examinations. OpenAI’s GPT-realtime ranked lowest across all seven evaluated domains. The researchers concluded that AI systems like this are best utilized as supportive tools for healthcare professionals rather than substitutes for their clinical judgment.

Moving forward, it remains uncertain whether this research initiative will evolve into a commercially available product. Although the results underscore progress in AI-driven evidence synthesis and telemedicine applications, there remains a clear gap when compared to the expertise of experienced physicians, particularly in safety-critical scenarios. “While it’s early days, the promise is clear,” noted DeepMind researcher Alan Karthikesalingam.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Business

Red Hat advances enterprise AI with Small Language Models that achieve over 98% validity in structured tasks, prioritizing reliability and data sovereignty.

AI Research

OpenAI's o1 model achieves 81.6% diagnostic accuracy in emergency situations, surpassing human doctors and signaling a major shift in medical practice.

AI Marketing

BusySeed unveils Rankxa, a tool tracking brand visibility across AI-generated responses, revealing 90% of brands lack meaningful presence in this new landscape.

AI Regulation

Korea Venture Investment Corp. unveils AI-driven fund management systems by integrating Nvidia H200 GPUs to enhance efficiency and support unicorn growth.

AI Technology

Apple raises Mac mini starting price to $799 amid AI-driven inventory shortages, eliminating the $599 model in response to surging demand for advanced computing.

AI Research

IBM launches a Chicago Quantum Hub to create 750 AI jobs and expands its MIT partnership to advance quantum computing and AI integration.

AI Government

71% of Australian employees use generative AI daily, but only 36% trust its implementation, highlighting urgent calls for better policy frameworks and safeguards.

AI Technology

A1 Public Relations helps entertainment brands enhance AI visibility in 2026 by integrating structured content and fresh, authoritative media, ensuring they are recognized by...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.