Saved in:
Bibliographic Details
Main Authors: Yi, Ziruo, Xiao, Ting, Albert, Mark V.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.09787
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912376587026432
author Yi, Ziruo
Xiao, Ting
Albert, Mark V.
author_facet Yi, Ziruo
Xiao, Ting
Albert, Mark V.
contents Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images, with the potential to enhance clinical workflows and reduce radiologists' workload. While recent approaches leveraging multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have achieved strong results, they continue to face challenges such as factual inconsistency, hallucination, and cross-modal misalignment. We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow, where task-specific agents handle retrieval, draft generation, visual analysis, refinement, and synthesis. Experimental results demonstrate that our approach outperforms a strong baseline in both automatic metrics and LLM-based evaluations, producing more accurate, structured, and interpretable reports. This work highlights the potential of clinically aligned multi-agent frameworks to support explainable and trustworthy clinical AI applications.
format Preprint
id arxiv_https___arxiv_org_abs_2505_09787
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Multimodal Multi-Agent Framework for Radiology Report Generation
Yi, Ziruo
Xiao, Ting
Albert, Mark V.
Artificial Intelligence
Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images, with the potential to enhance clinical workflows and reduce radiologists' workload. While recent approaches leveraging multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have achieved strong results, they continue to face challenges such as factual inconsistency, hallucination, and cross-modal misalignment. We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow, where task-specific agents handle retrieval, draft generation, visual analysis, refinement, and synthesis. Experimental results demonstrate that our approach outperforms a strong baseline in both automatic metrics and LLM-based evaluations, producing more accurate, structured, and interpretable reports. This work highlights the potential of clinically aligned multi-agent frameworks to support explainable and trustworthy clinical AI applications.
title A Multimodal Multi-Agent Framework for Radiology Report Generation
topic Artificial Intelligence
url https://arxiv.org/abs/2505.09787