Saved in:
Bibliographic Details
Main Authors: Yu, Congjing, Ye, Jing, Liu, Yang, Zhang, Xiaodong, Zhang, Zhiyong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.19439
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911156166197248
author Yu, Congjing
Ye, Jing
Liu, Yang
Zhang, Xiaodong
Zhang, Zhiyong
author_facet Yu, Congjing
Ye, Jing
Liu, Yang
Zhang, Xiaodong
Zhang, Zhiyong
contents Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.
format Preprint
id arxiv_https___arxiv_org_abs_2506_19439
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data
Yu, Congjing
Ye, Jing
Liu, Yang
Zhang, Xiaodong
Zhang, Zhiyong
Computer Vision and Pattern Recognition
Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.
title AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2506.19439