Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yichen, Peng, Zelin, Tang, Fenghe, Yang, Piao, Huang, Yu, Shen, Wei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.12843
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914608205266944
author	Zhao, Yichen Peng, Zelin Tang, Fenghe Yang, Piao Huang, Yu Shen, Wei
author_facet	Zhao, Yichen Peng, Zelin Tang, Fenghe Yang, Piao Huang, Yu Shen, Wei
contents	Chest X-ray (CXR) reporting follows a region-based clinical workflow in which radiologists inspect anatomical regions and integrate localized findings into a final report. However, existing resources for CXR report generation provide these supervision signals in fragmented forms. We introduce MMRad-22K, a dataset that organizes regional textual observations, anatomical grounding coordinates, localized image evidence, and report targets into structured multimodal evidence units for CXR report generation. To motivate this formulation, we first compare different evidence formats for report generation and find that structured multimodal evidence is generally more useful than text-only or bounding box-based evidence. We then adapt a unified LVLM backbone using MMRad-22K and show that adaptation with multimodal evidence outperforms both textual-evidence adaptation and end-to-end adaptation on language and clinically oriented metrics. Under the same evaluation protocol, the adapted model also reaches a performance level comparable to several open-source LVLM references. Together, these results support MMRad-22K as a practical structured multimodal resource for training and evaluating CXR report generation aligned with clinical reading workflows.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_12843
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MMRad-22K: A Structured Multimodal Evidence Dataset for Chest X-ray Report Generation Zhao, Yichen Peng, Zelin Tang, Fenghe Yang, Piao Huang, Yu Shen, Wei Computer Vision and Pattern Recognition Chest X-ray (CXR) reporting follows a region-based clinical workflow in which radiologists inspect anatomical regions and integrate localized findings into a final report. However, existing resources for CXR report generation provide these supervision signals in fragmented forms. We introduce MMRad-22K, a dataset that organizes regional textual observations, anatomical grounding coordinates, localized image evidence, and report targets into structured multimodal evidence units for CXR report generation. To motivate this formulation, we first compare different evidence formats for report generation and find that structured multimodal evidence is generally more useful than text-only or bounding box-based evidence. We then adapt a unified LVLM backbone using MMRad-22K and show that adaptation with multimodal evidence outperforms both textual-evidence adaptation and end-to-end adaptation on language and clinically oriented metrics. Under the same evaluation protocol, the adapted model also reaches a performance level comparable to several open-source LVLM references. Together, these results support MMRad-22K as a practical structured multimodal resource for training and evaluating CXR report generation aligned with clinical reading workflows.
title	MMRad-22K: A Structured Multimodal Evidence Dataset for Chest X-ray Report Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.12843

Similar Items