Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Xiaohan, Fan, Chenxiao, Ma, Wenyin, He, Hongliang, Gao, Chongming, Zhao, Xiaoyan, Feng, Fuli
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.08559
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917397964783616
author	Ren, Xiaohan Fan, Chenxiao Ma, Wenyin He, Hongliang Gao, Chongming Zhao, Xiaoyan Feng, Fuli
author_facet	Ren, Xiaohan Fan, Chenxiao Ma, Wenyin He, Hongliang Gao, Chongming Zhao, Xiaoyan Feng, Fuli
contents	Large language models (LLMs) have achieved strong performance on medical exam-style tasks, motivating growing interest in their deployment in real-world clinical settings. However, clinical decision-making is inherently safety-critical, context-dependent, and conducted under evolving evidence. In such situations, reliable LLM performance depends not on factual recall alone, but on robust medical reasoning. In this work, we present a comprehensive review of medical reasoning with LLMs. Grounded in cognitive theories of clinical reasoning, we conceptualize medical reasoning as an iterative process of abduction, deduction, and induction, and organize existing methods into seven major technical routes spanning training-based and training-free approaches. We further conduct a unified cross-benchmark evaluation of representative medical reasoning models under a consistent experimental setting, enabling a more systematic and comparable assessment of the empirical impact of existing methods. To better assess clinically grounded reasoning, we introduce MR-Bench, a benchmark derived from real-world hospital data. Evaluations on MR-Bench expose a pronounced gap between exam-level performance and accuracy on authentic clinical decision tasks. Overall, this survey provides a unified view of existing medical reasoning methods, benchmarks, and evaluation practices, and highlights key gaps between current model performance and the requirements of real-world clinical reasoning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_08559
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Medical Reasoning with Large Language Models: A Survey and MR-Bench Ren, Xiaohan Fan, Chenxiao Ma, Wenyin He, Hongliang Gao, Chongming Zhao, Xiaoyan Feng, Fuli Computation and Language Artificial Intelligence Large language models (LLMs) have achieved strong performance on medical exam-style tasks, motivating growing interest in their deployment in real-world clinical settings. However, clinical decision-making is inherently safety-critical, context-dependent, and conducted under evolving evidence. In such situations, reliable LLM performance depends not on factual recall alone, but on robust medical reasoning. In this work, we present a comprehensive review of medical reasoning with LLMs. Grounded in cognitive theories of clinical reasoning, we conceptualize medical reasoning as an iterative process of abduction, deduction, and induction, and organize existing methods into seven major technical routes spanning training-based and training-free approaches. We further conduct a unified cross-benchmark evaluation of representative medical reasoning models under a consistent experimental setting, enabling a more systematic and comparable assessment of the empirical impact of existing methods. To better assess clinically grounded reasoning, we introduce MR-Bench, a benchmark derived from real-world hospital data. Evaluations on MR-Bench expose a pronounced gap between exam-level performance and accuracy on authentic clinical decision tasks. Overall, this survey provides a unified view of existing medical reasoning methods, benchmarks, and evaluation practices, and highlights key gaps between current model performance and the requirements of real-world clinical reasoning.
title	Medical Reasoning with Large Language Models: A Survey and MR-Bench
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2604.08559

Similar Items