Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Yihong, He, Wenwu, Cui, Zhuo-Xu, Liang, Dong
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.06409
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916939588173824
author	Luo, Yihong He, Wenwu Cui, Zhuo-Xu Liang, Dong
author_facet	Luo, Yihong He, Wenwu Cui, Zhuo-Xu Liang, Dong
contents	This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_06409
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning Luo, Yihong He, Wenwu Cui, Zhuo-Xu Liang, Dong Artificial Intelligence This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.
title	Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2509.06409

Similar Items