Google DeepMind’s AI Co-Clinician Surpasses GPT-5.4 in Blind Doctor Tests

Google DeepMind’s AI co-clinician outperformed GPT-5.4 in doctor tests, achieving 67 preferences in primary care queries and a remarkable 95% quality score in open-ended medication questions.

Staff

Published

12 minutes ago

Google DeepMind is advancing healthcare technology with its development of an “AI co-clinician” designed to assist doctors in patient care. Initial simulation studies indicate promising results, though the AI system has not yet matched the performance of seasoned physicians. Additionally, the research highlights limitations of ChatGPT’s voice mode for serious applications, particularly in medical consultations.

The AI co-clinician operates within a framework termed “triadic care,” wherein AI agents support patients under the supervision of doctors, maintaining clinical authority. This collaborative approach aims to enhance patient treatment while ensuring oversight remains in the hands of qualified medical professionals.

To assess the system from a clinician’s viewpoint, researchers collaborated with academic physicians to implement the NOHARM framework, which evaluates two categories of mistakes: errors of commission and errors of omission. In a blind comparison involving 98 primary care queries, doctors favored the AI co-clinician’s responses over leading evidence synthesis tools. The AI co-clinician achieved 67 preferences compared to an existing clinical AI system’s 26, and it outperformed GPT-5.4-thinking-with-search by a score of 63 to 30. Notably, the AI co-clinician made a critical error in only one of the 98 cases evaluated.

The lead was particularly pronounced in medication inquiries. The RxQA benchmark, which includes 600 questions on active ingredients, interactions, and dosages sourced from national drug directories and vetted by licensed pharmacists, posed challenges for primary care physicians. With reference materials, doctors answered 61.3 percent correctly, but this dropped to 48.3 percent without external assistance. The AI co-clinician excelled with a score of 73.3 percent, surpassing GPT-5.4-thinking-with-search, which scored 72.7 percent. The performance gap increased when questions were posed in an open-ended format, typical of real-world searches; here, the AI co-clinician achieved a remarkable quality score of 95.0 percent, compared to 90.9 percent for OpenAI’s model.

In addition to text-based support, Google DeepMind is exploring the AI co-clinician’s capabilities in telemedicine through real-time audio and video interactions. Partnering with physicians at Harvard and Stanford, researchers conducted a randomized simulation study involving 20 synthetic clinical scenarios, 10 doctors acting as patient representatives, culminating in 120 hypothetical telemedicine consultations. The AI co-clinician demonstrated abilities that extend beyond text-only systems, such as correcting a patient’s inhaler technique and guiding patients through shoulder exams to identify rotator cuff injuries.

In patient-facing dialogues, the AI co-clinician employs a dual-agent configuration: a “Planner” module oversees the conversation to ensure the “Talker” agent adheres to safe clinical practices. When utilized by doctors, the system emphasizes solid clinical evidence and conducts verification and citation checks during information retrieval.

Despite these advancements, the study revealed that experienced physicians consistently outperformed the AI co-clinician across 140 assessed aspects of consultation quality, including triage, history taking, clinical reasoning, communication and counseling, treatment steps, recognizing warning signs, and conducting physical exams. The findings suggest that while the AI co-clinician matched or exceeded primary care physicians in 68 of the evaluated areas, it lagged behind seasoned doctors, especially in identifying critical warning signs and executing thorough physical examinations. OpenAI’s GPT-realtime ranked lowest across all seven evaluated domains. The researchers concluded that AI systems like this are best utilized as supportive tools for healthcare professionals rather than substitutes for their clinical judgment.

Moving forward, it remains uncertain whether this research initiative will evolve into a commercially available product. Although the results underscore progress in AI-driven evidence synthesis and telemedicine applications, there remains a clear gap when compared to the expertise of experienced physicians, particularly in safety-critical scenarios. “While it’s early days, the promise is clear,” noted DeepMind researcher Alan Karthikesalingam.

AI Generative

AI Threatens Electoral Integrity in Nigeria’s 2027 Elections Amid Rising Misinformation Risks

Experts warn that AI misuse, including deepfakes and misinformation, could undermine Nigeria's 2027 elections, threatening electoral integrity and public trust.

Staff22 minutes ago

AI Regulation

Socify.ai Reaches 100 Clients in Six Months, Transforming SOC 2 Compliance

Socify.ai, launched by TAC Security, onboarded 100 clients in six months, revolutionizing SOC 2 compliance with continuous monitoring and automation.

Staff57 minutes ago

AI Technology

AMD Set to Boost Revenue with Next-Gen Consoles, Driving Stock Growth Beyond 60%

AMD predicts over 60% revenue growth driven by next-gen consoles and AI data center expansion, potentially elevating stock to $660 within five years

Staff1 hour ago

AI Business

IBM Reveals AI Solutions to Transform Retail Experience at Think 2026 Conference

IBM unveils agentic AI solutions at Think 2026, promising to enhance retail operations and customer experiences through intelligent, real-time insights and automation.

Marcus Chen2 hours ago

AI Cybersecurity

UAE Warns of AI-Powered Cyber Threats from Iran Amid 700,000 Daily Attacks

UAE faces 700,000 daily cyberattacks, with AI-driven threats from Iran escalating, prompting urgent public awareness and enhanced cybersecurity measures.

Rachel Torres4 hours ago