Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Guangqian, Du, Kangrui, Yang, Zhihan, Du, Ye, Zheng, Yongping, Wang, Shujun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.16520
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910381673283584
author	Yang, Guangqian Du, Kangrui Yang, Zhihan Du, Ye Zheng, Yongping Wang, Shujun
author_facet	Yang, Guangqian Du, Kangrui Yang, Zhihan Du, Ye Zheng, Yongping Wang, Shujun
contents	Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in the same modality, and an inter-modal contrastive learning module to alleviate misaligned representation among modalities. Our framework consists of two main steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal representations with contrastive learning mechanisms from both intra-modal and inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset and validated on the downstream task for AD classification. The proposed CMViM yields 2.7\% AUC performance improvement compared with other state-of-the-art methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_16520
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification Yang, Guangqian Du, Kangrui Yang, Zhihan Du, Ye Zheng, Yongping Wang, Shujun Computer Vision and Pattern Recognition Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in the same modality, and an inter-modal contrastive learning module to alleviate misaligned representation among modalities. Our framework consists of two main steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal representations with contrastive learning mechanisms from both intra-modal and inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset and validated on the downstream task for AD classification. The proposed CMViM yields 2.7\% AUC performance improvement compared with other state-of-the-art methods.
title	CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.16520

Similar Items