Why You Should Not Ask ChatGPT for Medical Advice
The Sydney Morning Herald
SKIPPED
Details
- Date Published
- 4 Apr 2024
- Priority Score
- 3
- Australian
- Yes
- Created
- 8 Mar 2025, 01:04 pm
Description
The more information you give a GP, the more likely you are to get a diagnosis. Not so for ChatGPT.
Summary
This article highlights the risks of using ChatGPT for medical advice, emphasizing the AI's inaccuracies when dealing with nuanced medical questions. Australian researchers found that while ChatGPT can handle simple yes or no questions with some reliability, its accuracy significantly drops when given more complex queries. This has implications for AI safety, especially considering the widespread use and influence of AI in sensitive domains like healthcare. The discussion includes insights from experts on the necessity of regulation and responsible deployment of AI in healthcare, emphasizing the potential dangers of using untested AI systems for critical health decisions. The article underscores Australia's role in AI governance within healthcare through the adoption of a national AI road map for safe and responsible AI implementation.
Body
ByAngus ThomsonApril 4, 2024 — 6.00amSaveLog in,registerorsubscribeto save articles for later.Save articles for laterAdd articles to your saved list and come back to them any time.Got itNormal text sizeLarger text sizeVery large text sizeIf you asked a doctor whether to use ice to treat a burn, they would quickly advise you to run it under cold water instead. Even “Dr Google” will tell you that extreme cold constricts the blood vessels and can make a burn worse.But what happens when you ask ChatGPT the same question? The chatbot will tell you it’s fine to use ice – so long as you wrap in a towel.The question is one of a hundred common health queries that Australian researchers used to test the chatbot’s ability to provide medical advice.They found the software was fairly accurate when asked to provide a yes or no answer, but became less reliable when given more information – answering some questions with just 28 per cent accuracy.Co-author Dr Bevan Koopman, CSIRO principal research scientist and associate professor at the University of Queensland, has spent years looking at how search engines are used in healthcare.He said people were increasingly using tools such as ChatGPT for medical advice despite the well-documented pitfalls of seeking health information online.“These models have come on to the scene so quickly ... but there isn’t really the understanding of how well they perform and how best to deploy them,” he said. “In the end, you want reliable medical advice … and these models are not at all appropriate for doing things like diagnosis.”The study compared ChatGPT’s response to a known correct response for a set of questions developed to test the accuracy of search engines such as Google.It answered correctly 80 per cent of the time when asked to give a yes or no answer. But when provided with supporting evidence in the prompt, accuracy was reduced to 63 per cent, and fell to 28 per cent when an “unsure” answer was allowed.AdvertisementInverting the prompts to frame the question as a negative also reduced the accuracy of its answers – from 80 per cent to 56 per cent for the yes/no option, and from 33 per cent to just 4 per cent when it was given a third option of “unsure”.Koopman said large language models such as ChatGPT were only as good as the information they were trained on, and hoped the study would provide a stepping stone for the next generation of health-specific tools “that would be much more effective”.A national road map for artificial intelligence (AI) in healthcare, released last year, recommended the government “urgently communicate the need for caution” when using generative AI that is untested and unregulated in healthcare settings.Dr Bevan Koopman, CSIRO principal research scientist and associate professor at the University of Queensland.Credit:Professor Enrico Coiera, the director of Macquarie University’s Centre for Health Informatics and one of the authors of the road map, said some doctors were using large language models to help them take patient notes and write letters, but these had so far avoided the regulation and testing hurdles that every other health technology has to go through.Loading“In Silicon Valley they say, ‘move fast and break things’. That’s not a good mantra in healthcare where the things you might break are people,” he said.Large language models construct sentences by assessing a huge database of words and how often they appear next to each other. They are chatty and easy to use but “don’t know anything about medicine”, Coiera said, and therefore should be supported by another kind of AI that can better answer health-related questions.Dr Rob Hosking, a GP and the chairman of the Royal Australian College of General Practitioners’ technology committee, said there was a place for large language models in healthcare “if it’s trained on medical quality data, and supervised by a clinician who knows how to understand the data”.“It’s really no different from our perspective – people come in with information they’ve got from friends, family or the internet,” he said. “It’s a bit like the move from using pen and paper to using a word processor – it’s a tool. We can’t take it as gospel.”Start the day with a summary of the day’s most important and interesting stories, analysis and insights.Sign up for our Morning Edition newsletter.SaveLog in,registerorsubscribeto save articles for later.License this articleAIFor subscribersHealthcareOpenAIHealthAngus Thomsonis a reporter covering health at the Sydney Morning Herald.Connect viaTwitteroremail.Loading