© sdecoret – stock.adobe.com

Article • Conversational AI in medicine

How to teach an LLM to think like a clinician

While generative AI shows immense potential for healthcare, a critical reliability issue lurks beneath the surface: LLMs don't think like doctors do, a data science expert explained at the Emerging Technologies in Medicine (ETIM) congress in Essen. This potentially fatal flaw, however, may be fixable, he suggested.

Article: Wolfgang Behrends

From patient communication assistance to clinical decision support and automated reporting – Prof.  Michael Gertz pointed out how LLMs show great promise to help clinicians at almost every task across the patient journey.1 However, the models suffer from fluctuating performance and therefore lack the reliability needed for sensitive healthcare applications, explained the Head of the Data Science Group at Heidelberg University.2 

To understand the frequent inaccuracies within the AI output, ‘it is important to keep in mind that an LLM is basically just a machine for next word prediction’, Gertz said. While this may be useful for tasks requiring a certain degree of creativity, it can lead to potentially severe errors in a medical context.

Recommended article

‘LLMs demonstrate remarkable language generation skills, which can easily be mistaken for genuine reasoning,’ the expert cautioned. ‘In high-stakes domains like medicine, this illusion can lead to real harm. This is why, despite their promise, these models must be integrated carefully, with human oversight and robust methods for verifying correctness and sources.’

Taking it one step at a time

Clinicians and patients want to know where a given piece of information is coming from – from a peer-reviewed study, a reputable guideline, or just online chatter?

Michael Gertz

Efforts to improve the reliability of LLM output include careful prompt engineering, using advanced techniques such as retrieval-augmented generation (RAG) and thorough fine-tuning of results in safe testing environments. Using these methods, it is possible to guide the AI models towards an approximation of clinical reasoning, Gertz suggested. This includes concepts such as evidence-based decision-making, pattern recognition, and probabilistic thinking.

A suitable way to introduce an LLM to this line of reasoning can be through chain-of-thought (CoT) – a method derived from the structured interviews performed in psychotherapy, the expert explained. Basically, this replaces the LLM’s associative default mode of operation with a more thorough step-by-step approach. Adopting this format, an LLM can be trained to follow decision trees, eventually leading to a way of reasoning closer to that of a clinician using causal chains before coming to a diagnosis.3 

Transparent attribution is key

Photo
Prof. Michael Gertz during his presentation at ETIM in Essen

Photo: HiE/Behrends

While the aforementioned measures reduce the likelihood of AI-generated “hallucinations”, several challenges with current LLMs must be addressed before safe adoption in a medical setting, Gertz said. For one, the models are often trained on huge, uncurated datasets. This makes pinpointing the origin of a single statement nearly impossible. ‘In healthcare, trust and accountability hinge on traceability,’ the expert stressed. ‘Clinicians and patients want to know where a given piece of information is coming from – from a peer-reviewed study, a reputable guideline, or just online chatter?’ Consequently, thorough and transparent attribution of sources is vital – especially so considering that current models tend to simply invent citations to support statements, thus posing a serious ethical and practical problem in medical contexts, Gertz added. 

Further issues include significant drops in LLM reliability when the models are confronted with novel or complex circumstances and the as-of-yet unclear legal responsibility in case of faulty medical advice. ‘For the foreseeable future, there will always have to be a human in the loop’, Gertz concluded – ‘a medical professional to interpret and validate outputs before any critical decision-making.’ 


Profile: 

Michael Gertz is a full professor at Heidelberg University where he heads the Data Science Group at the faculty of Mathematics and Computer Science. From 1997 until 2008 he was a faculty at Department of Computer Science at the University of California at Davis. His interdisciplinary research interests include natural language processing, AI, complex networks, and scientific data management, with applications in the medical sciences, law, physics, political sciences, and economics.


References: 

  1. Bhayana R: Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications; Radiology 2024; https://doi.org/10.1148/radiol.232756 
  2. Gupta M, Virostko J, Kaufmann C: Large language models in radiology: Fluctuating performance and decreasing discordance over time; European Journal of Radiology 2025; https://doi.org/10.1016/j.ejrad.2024.111842 
  3. Liévin V, Hother CE, Motzfeldt AG, Winther O: Can large language models reason about medical questions?; arXiv preprint 2023; https://doi.org/10.48550/arXiv.2207.08143

15.04.2025

Related articles

Photo

News • Misleading medical analyses

AI “predicts” beer drinking based on knee X-rays – why this is not only wrong, but dangerous

Can an AI determine whether or not a person drinks beer by looking at their knee X-rays? It can't – but the claim shows why “shortcut learning” is such a dangerous mechanism in medical AI.

Photo

Article • Artificial intelligence meets internal medicine

Medical AI: Enter ‘dea ex machina’

In the world of theatre, the ‘deus ex machina’, the god from the machine, is a dramaturgical trick to resolve seemingly unsolvable conflicts. Can artificial intelligence (AI) also be such a…

Photo

News • Multimodal approach

Chest X-rays + patient data + AI = better diagnosis?

A new artificial intelligence (AI) model combines imaging information with clinical patient data to improve diagnostic performance on chest X-rays, a new study finds.

Related products

Subscribe to Newsletter