Hidden dangers in seeking medical advice from LLMs
·
3 days ago
Last year, ChatGPT passed the US Medical Licensing Exam and was reported to be âmore empatheticâ than real doctors. ChatGPT currently has around 180 million users; if a mere 10% of them have asked ChatGPT a medical question, thatâs already a population two times larger than New York City using ChatGPT like a doctor. Thereâs an ongoing explosion of medical chatbot startups building thin wrappers around ChatGPT to dole out medical advice. But ChatGPT is not a doctor, and using ChatGPT for medical advice is not only against OpenAIâs Usage policies, it can be dangerous.
In this article, I identify four key problems with using existing general-purpose chatbots to answer patient-posed medical questions. I provide examples of each problem using real conversations with ChatGPT. I also explain why building a chatbot that can safely answer patient-posed questions is completely different than building a chatbot that can answer USMLE questions. Finally, I describe steps that everyone can take â patients, entrepreneurs, doctors, and companies like OpenAI â to make chatbots medically safer.
Notes
For readability I use the term âChatGPT,â but the article applies to all publicly available general-purpose large language models (LLMs), including ChatGPT, GPT-4, Llama2, Gemini, and others. A few LLMs specifically designed for medicine do exist, like Med-PaLM; this article is not about those models. Iâm focused here on on general-purpose chatbots because (a) they have the most users; (b) theyâre easy to access; and (c) many patients are already using them for medical advice.
In the chats with ChatGPT, I provide verbatim quotes of ChatGPTâs response, with ellipses [â¦] to indicate material that was left out for brevity. I never left out anything that wouldâve changed my assessment of ChatGPTâs response. For completeness, the full chat transcripts are provided in a Word document attached to the end of this article. The words âPatient:â and âChatGPT:â are dialogue tags and were added afterwards for clarity. These dialogue tags were not part of the prompts or responses.
