ChatGPT Health Fails to Recognise Medical Emergencies in Over Half of Cases

ChatGPT Health Under-Triages Over Half of Medical Emergencies, Study Finds

In a startling revelation, independent research has uncovered that ChatGPT Health, an AI platform developed by OpenAI, fails to recommend hospital visits in more than 50% of cases where urgent medical care is necessary. Published in the February edition of Nature Medicine, this first safety evaluation highlights critical risks that experts warn could lead to unnecessary harm and death.

Alarming Under-Triage Rates in Simulated Scenarios

Dr. Ashwin Ramaswamy, lead author of the study and a urology instructor at the Icahn School of Medicine at Mount Sinai, explained that the research aimed to answer a fundamental safety question: whether ChatGPT Health would direct users to emergency departments during real medical crises. The team created 60 realistic patient scenarios, ranging from mild illnesses to severe emergencies, which were reviewed by three independent doctors to establish the appropriate level of care based on clinical guidelines.

By querying ChatGPT Health under various conditions—such as altering patient gender, adding test results, or including family comments—nearly 1,000 responses were generated. The comparison with doctors' assessments revealed that in 51.6% of cases requiring immediate hospitalisation, the platform advised staying home or scheduling a routine appointment instead.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Specific Failures and Expert Concerns

While ChatGPT Health performed adequately in textbook emergencies like strokes or severe allergic reactions, it struggled in other critical situations. For instance, in an asthma scenario, the AI recommended waiting despite identifying early signs of respiratory failure. Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, described these findings as "unbelievably dangerous." She noted that in one simulation, the platform sent a suffocating woman to a future appointment she wouldn't survive 84% of the time.

Ruani emphasised the false sense of security such systems can create, stating, "If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life." Additionally, the study found that ChatGPT Health was nearly 12 times more likely to downplay symptoms if a "friend" in the scenario suggested they were not serious, further compounding the risk.

Suicidal Ideation Detection Flaws

Dr. Ramaswamy expressed particular concern over the platform's handling of suicidal ideation. In tests with a 27-year-old patient contemplating overdose, the crisis intervention banner appeared consistently when symptoms were described alone. However, when normal lab results were added, the banner vanished in all 16 attempts. "A crisis guardrail that depends on whether you mentioned your labs is not ready, and it's arguably more dangerous than having no guardrail at all," he warned.

Broader Implications and Calls for Safeguards

Professor Paul Henman, a digital sociologist and policy expert at the University of Queensland, praised the study as "really important," noting that widespread use of ChatGPT Health could lead to both unnecessary medical presentations for minor conditions and failures to seek urgent care when needed. He highlighted potential legal liabilities, referencing ongoing cases against tech companies related to suicide and self-harm from AI chatbot use.

Ruani and other experts are advocating for urgent development of clear safety standards and independent auditing mechanisms to mitigate preventable harm. Despite OpenAI's response that the study doesn't reflect typical real-life usage and that the model is continuously updated, Ruani argued that "a plausible risk of harm is enough to justify stronger safeguards and independent oversight."

With over 40 million people reportedly seeking health advice from ChatGPT daily, these findings underscore the pressing need for robust regulatory frameworks to ensure AI tools in healthcare do not compromise patient safety.