ChatGPT won’t be replacing doctors anytime soon, with a CSIRO study showing the AI tool becomes less reliable as it is provided with more health information.
The world-first study found that, when asked a health-related question, as more evidence was given to ChatGPT the accuracy of its responses fell to as low as 28 percent.
CSIRO Principal Research Scientist Dr Bevan Koopman said even though the risks of searching for health information online were well documented, people continued to seek health information online, and increasingly via large language models (LLMs) like ChatGPT.
Scientists from CSIRO explored a hypothetical scenario of an average person (non-professional health consumer) asking ChatGPT if ‘X’ treatment had a positive effect on condition ‘Y’.
The 100 questions presented ranged from ‘Can zinc help treat the common cold?’ to ‘Will drinking vinegar dissolve a stuck fish bone?’
ChatGPT’s response was compared to the known correct response, based on existing medical knowledge.
Dr Koopman said while LLMs had the potential to improve the way people accessed information, more research was needed to understand where they were effective and where they were not.
The study looked at two question formats. The first was a question only. The second was a question biased with supporting or contrary evidence.
Results revealed that ChatGPT was quite good at giving accurate answers in a question-only format, with an 80 percent accuracy in this scenario.
However, when the language model was given an evidence-biased prompt, accuracy reduced to 63 percent.
Accuracy was reduced again to 28 percent when an “unsure” answer was allowed. This finding was contrary to popular belief that prompting with evidence improves accuracy.
“We’re not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy,” Dr Koopman said.