Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xi, Meng, Zaiqiao, Lever, Jake, Ho, Edmond S. L.
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning I.2.10; J.3; I.5.4
Online Access:	https://arxiv.org/abs/2411.19378
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce Libra, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (TAC), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled MLLMs, setting new standards in both clinical relevance and lexical accuracy.

Similar Items