Study Reveals AI Chatbots Provide Problematic Medical Advice in 20% of Cases

A recent study published in the journal BMJ Open has uncovered significant concerns regarding the reliability of AI chatbots in providing health and medical information. The research evaluated five prominent AI chatbots—ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—by posing 50 diverse medical questions to each. The findings revealed that nearly 20 per cent of all responses contained problematic information, with half of the answers exhibiting various issues that could mislead users.

Performance Varies Widely Among Chatbots

The study highlighted stark differences in performance across the tested AI systems. Grok emerged as the poorest performer, with a staggering 58 per cent of its answers deemed problematic. This included inaccuracies, incomplete information, and potentially harmful advice. In contrast, other chatbots like ChatGPT and Gemini showed relatively better results, though they still fell short of providing consistently reliable medical guidance.

Specific Challenges in Health Queries

Researchers noted that AI chatbots struggled particularly with open-ended health questions, where nuanced or context-dependent responses are required. Topics such as nutrition proved to be a weak spot, with chatbots often providing oversimplified or incorrect dietary recommendations. However, the systems performed comparatively better on more structured subjects, including vaccines and cancer, where factual data is more readily available and less ambiguous.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Unreliable References and Fabricated Sources

A critical and alarming finding from the study was the unreliability of references generated by these AI chatbots. None of the five systems produced a fully accurate list of sources for their answers. Many responses included fabricated academic papers or broken links to non-existent studies, undermining the credibility of the information provided. This issue raises serious concerns about the potential for spreading misinformation in the medical field.

Expert Recommendations and Cautions

Medical experts and researchers involved in the study strongly caution against relying on AI chatbots as definitive medical authorities. While these tools can be useful for summarising information or helping users prepare questions for healthcare professionals, they should not be used for self-diagnosis or treatment decisions. The study underscores the importance of consulting qualified medical practitioners for accurate and personalised health advice.

The research also calls for improved transparency and validation mechanisms in AI development, particularly for applications in sensitive areas like healthcare. As AI technology continues to evolve, ensuring its safe and ethical use in medical contexts remains a pressing challenge for developers, regulators, and users alike.