Saved in:
Bibliographic Details
Main Authors: Luo, Yihong, He, Wenwu, Cui, Zhuo-Xu, Liang, Dong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.06409
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916939588173824
author Luo, Yihong
He, Wenwu
Cui, Zhuo-Xu
Liang, Dong
author_facet Luo, Yihong
He, Wenwu
Cui, Zhuo-Xu
Liang, Dong
contents This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.
format Preprint
id arxiv_https___arxiv_org_abs_2509_06409
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
Luo, Yihong
He, Wenwu
Cui, Zhuo-Xu
Liang, Dong
Artificial Intelligence
This study presents DiagCoT, a multi-stage framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs) to emulate radiologists' stepwise diagnostic reasoning using only free-text reports. DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency. On the MIMIC-CXR benchmark, DiagCoT improved zero-shot disease classification AUC from 0.52 to 0.76 (absolute gain of 0.24), pathology grounding mIoU from 0.08 to 0.31 (absolute gain of 0.23), and report generation BLEU from 0.11 to 0.33 (absolute gain of 0.22). It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets. By converting unstructured clinical narratives into structured supervision, DiagCoT offers a scalable approach for developing interpretable and diagnostically competent AI systems for radiology.
title Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning
topic Artificial Intelligence
url https://arxiv.org/abs/2509.06409