Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Congjing, Ye, Jing, Liu, Yang, Zhang, Xiaodong, Zhang, Zhiyong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.19439
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911156166197248
author	Yu, Congjing Ye, Jing Liu, Yang Zhang, Xiaodong Zhang, Zhiyong
author_facet	Yu, Congjing Ye, Jing Liu, Yang Zhang, Xiaodong Zhang, Zhiyong
contents	Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_19439
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data Yu, Congjing Ye, Jing Liu, Yang Zhang, Xiaodong Zhang, Zhiyong Computer Vision and Pattern Recognition Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.
title	AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.19439

Similar Items