Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Zhongzhen, Ling, Yan, Chen, Hong, Feng, Ye, Wu, Li, Mu, Linjie, Zhang, Shaoting, Zhang, Xiaofan, Qian, Kun, Li, Xiaomu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.10492
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917350867992576
author	Huang, Zhongzhen Ling, Yan Chen, Hong Feng, Ye Wu, Li Mu, Linjie Zhang, Shaoting Zhang, Xiaofan Qian, Kun Li, Xiaomu
author_facet	Huang, Zhongzhen Ling, Yan Chen, Hong Feng, Ye Wu, Li Mu, Linjie Zhang, Shaoting Zhang, Xiaofan Qian, Kun Li, Xiaomu
contents	We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_10492
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent Huang, Zhongzhen Ling, Yan Chen, Hong Feng, Ye Wu, Li Mu, Linjie Zhang, Shaoting Zhang, Xiaofan Qian, Kun Li, Xiaomu Computation and Language We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
title	Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent
topic	Computation and Language
url	https://arxiv.org/abs/2603.10492

Similar Items