Newswise — New York, NY [February 24, 2026] — A recent evaluation by researchers from the Icahn School of Medicine at Mount Sinai has raised serious concerns about the performance of ChatGPT Health, a consumer artificial intelligence (AI) tool designed to provide health guidance. The study, published in the February 23, 2026 online issue of Nature Medicine, suggests that the tool may inadequately direct users to emergency care in numerous serious situations, particularly in cases of self-harm.
ChatGPT Health, launched in January 2026 by OpenAI, has already attracted around 40 million daily users seeking health information and advice on urgent care. However, the researchers highlighted a lack of independent evidence regarding its safety and reliability, which prompted their investigation. Lead author Ashwin Ramaswamy, MD, noted, “We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?”
The study utilized 60 structured clinical scenarios across 21 medical specialties, with three independent physicians assessing the urgency of each case based on guidelines from 56 medical societies. The results indicated that while ChatGPT Health handled straightforward emergencies appropriately, it under-triaged more than half of the cases deemed urgent by physicians. For example, the system correctly identified clear-cut emergencies such as strokes but struggled with nuanced situations where clinical judgment is crucial.
In tests involving suicide-risk alerts, the tool was intended to direct users to the 988 Suicide and Crisis Lifeline in high-risk situations. However, the researchers found that these alerts were inconsistent, sometimes triggering in lower-risk scenarios while failing to appear when users outlined specific self-harm plans. Girish N. Nadkarni, MD, MPH, a senior author of the study, expressed alarm over this finding, stating, “When someone talks about exactly how they would harm themselves, that’s a sign of more immediate and serious danger, not less.”
In total, the team conducted 960 interactions with ChatGPT Health across various contextual conditions, including differences in race, gender, and barriers to care like lack of insurance. The findings underscored that the AI tool often recognized dangerous indicators in its explanations but continued to reassure users, ultimately failing to prompt necessary action in critical scenarios. For instance, in an asthma case, although it identified early signs of respiratory failure, it still recommended delaying emergency treatment.
The researchers caution users that for serious symptoms such as chest pain, shortness of breath, severe allergic reactions, or suicidal thoughts, they should seek medical assistance directly rather than relying solely on AI recommendations. While the findings raise significant concerns, the authors do not advocate for abandoning AI health tools altogether. Alvira Tyagi, a first-year medical student and co-author of the study, emphasized the need to integrate such technologies thoughtfully into medical care rather than view them as substitutes for professional clinical judgment.
As AI models continue to evolve, the researchers stress the importance of ongoing independent evaluations to ensure that updates lead to improved safety in patient care. Tyagi remarked, “Starting medical training alongside tools that are evolving in real time makes it clear that today’s results are not set in stone.” The study aims to continue assessing updated versions of ChatGPT Health, with future research focusing on pediatric care, medication safety, and non-English-language usage.
The paper, titled “ChatGPT Health performance in a structured test of triage recommendations,” highlights the urgent need for a careful examination of AI tools in healthcare settings. With millions turning to these technologies for health guidance, ensuring their reliability and safety is more critical than ever.
See also
AI Study Reveals Generated Faces Indistinguishable from Real Photos, Erodes Trust in Visual Media
Gen AI Revolutionizes Market Research, Transforming $140B Industry Dynamics
Researchers Unlock Light-Based AI Operations for Significant Energy Efficiency Gains
Tempus AI Reports $334M Earnings Surge, Unveils Lymphoma Research Partnership
Iaroslav Argunov Reveals Big Data Methodology Boosting Construction Profits by Billions


















































