Saved in:
Bibliographic Details
Main Authors: Huang, Zhongzhen, Ling, Yan, Chen, Hong, Feng, Ye, Wu, Li, Mu, Linjie, Zhang, Shaoting, Zhang, Xiaofan, Qian, Kun, Li, Xiaomu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.10492
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917350867992576
author Huang, Zhongzhen
Ling, Yan
Chen, Hong
Feng, Ye
Wu, Li
Mu, Linjie
Zhang, Shaoting
Zhang, Xiaofan
Qian, Kun
Li, Xiaomu
author_facet Huang, Zhongzhen
Ling, Yan
Chen, Hong
Feng, Ye
Wu, Li
Mu, Linjie
Zhang, Shaoting
Zhang, Xiaofan
Qian, Kun
Li, Xiaomu
contents We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
format Preprint
id arxiv_https___arxiv_org_abs_2603_10492
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent
Huang, Zhongzhen
Ling, Yan
Chen, Hong
Feng, Ye
Wu, Li
Mu, Linjie
Zhang, Shaoting
Zhang, Xiaofan
Qian, Kun
Li, Xiaomu
Computation and Language
We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases. To evaluate its capabilities, we curated a benchmark of 82 authentic endocrinology case reports encompassing a broad spectrum of disease types and incidence levels. In controlled experiments, we compared PULSE's performance against physicians with varying levels of expertise-from residents to senior specialists-and examined how AI assistance influenced human diagnostic reasoning. PULSE attained expert-competitive accuracy, outperforming residents and junior specialists while matching senior specialist performance at both Top@1 and Top@4 thresholds. Unlike physicians, whose accuracy declined with disease rarity, PULSE maintained stable performance across incidence tiers. The agent also exhibited adaptive reasoning, increasing output length with case difficulty in a manner analogous to the longer deliberation observed among expert clinicians. When used collaboratively, PULSE enabled physicians to correct initial errors and broaden diagnostic hypotheses, but also introduced risks of automation bias. The study explores both serial and concurrent collaboration workflows, revealing that PULSE offers robust support across common and rare presentations. These findings underscore both the promise and the limitations of language model-based agents in clinical diagnosis, and offer a framework for evaluating their role in real-world decision-making.
title Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent
topic Computation and Language
url https://arxiv.org/abs/2603.10492