Saved in:
Bibliographic Details
Main Authors: Zhu, Jiayi, Huang, Fuxiang, Xie, Yu, Wang, Xi, Chen, Zhixuan, Guo, Yuan, Kong, Qingcong, Li, Zhenhui, Luo, Qiong, Chen, Hao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.31093
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918531346464768
author Zhu, Jiayi
Huang, Fuxiang
Xie, Yu
Wang, Xi
Chen, Zhixuan
Guo, Yuan
Kong, Qingcong
Li, Zhenhui
Luo, Qiong
Chen, Hao
author_facet Zhu, Jiayi
Huang, Fuxiang
Xie, Yu
Wang, Xi
Chen, Zhixuan
Guo, Yuan
Kong, Qingcong
Li, Zhenhui
Luo, Qiong
Chen, Hao
contents Breast cancer is a major global health concern, and mammography screening plays a central role in early detection. The large volume of screening examinations creates a substantial workload for radiologists, making accurate and consistent report generation a critical clinical challenge. Existing automated mammography report generation methods primarily focus on direct visual-to-text mapping, while overlooking the structured clinical reasoning process followed by radiologists in real-world practice. To address this limitation, we propose MammoRG, a mammography report generation framework that explicitly simulates the clinical reporting workflow by following the BI-RADS guideline and incorporating prior clinical knowledge to produce diagnostic reports. Specifically, MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency. To facilitate clinical efficacy evaluation of generated reports, we further develop MammoRGTool, a dedicated mammography report parsing tool that extracts structured clinical information from free-text reports. Extensive experiments demonstrate that MammoRG consistently outperforms existing methods across multiple clinical efficacy metrics, particularly in diagnosis-related BI-RADS F1, where it surpasses the second-best model by 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets, respectively.
format Preprint
id arxiv_https___arxiv_org_abs_2605_31093
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Cross-Modal Clinical Knowledge Integration for Mammography Report Generation
Zhu, Jiayi
Huang, Fuxiang
Xie, Yu
Wang, Xi
Chen, Zhixuan
Guo, Yuan
Kong, Qingcong
Li, Zhenhui
Luo, Qiong
Chen, Hao
Computer Vision and Pattern Recognition
Breast cancer is a major global health concern, and mammography screening plays a central role in early detection. The large volume of screening examinations creates a substantial workload for radiologists, making accurate and consistent report generation a critical clinical challenge. Existing automated mammography report generation methods primarily focus on direct visual-to-text mapping, while overlooking the structured clinical reasoning process followed by radiologists in real-world practice. To address this limitation, we propose MammoRG, a mammography report generation framework that explicitly simulates the clinical reporting workflow by following the BI-RADS guideline and incorporating prior clinical knowledge to produce diagnostic reports. Specifically, MammoRG adopts a two-stage training framework. In the first stage, the model learns to integrate clinically relevant prior knowledge from a patient's four-view mammograms through classification-based supervision. In the second stage, a terminology-aware supervised fine-tuning strategy is introduced to model mammography-specific clinical terms as atomic semantic units, enabling the generation of high-quality reports with improved clinical consistency. To facilitate clinical efficacy evaluation of generated reports, we further develop MammoRGTool, a dedicated mammography report parsing tool that extracts structured clinical information from free-text reports. Extensive experiments demonstrate that MammoRG consistently outperforms existing methods across multiple clinical efficacy metrics, particularly in diagnosis-related BI-RADS F1, where it surpasses the second-best model by 2.73%, 2.04%, 1.90%, and 3.27% on the internal, external 1, external 2, and VinDr-Mammo datasets, respectively.
title Cross-Modal Clinical Knowledge Integration for Mammography Report Generation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.31093