Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Jiaying, Jiang, Mingzhou, Wu, Junde, Zhu, Jiayuan, Wang, Ziyue, Jin, Yueming
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.00631
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929369729990656
author	Zhou, Jiaying Jiang, Mingzhou Wu, Junde Zhu, Jiayuan Wang, Ziyue Jin, Yueming
author_facet	Zhou, Jiaying Jiang, Mingzhou Wu, Junde Zhu, Jiayuan Wang, Ziyue Jin, Yueming
contents	Medicine is inherently a multimodal discipline. Medical images can reflect the pathological changes of cancer and tumors, while the expression of specific genes can influence their morphological characteristics. However, most deep learning models employed for these medical tasks are unimodal, making predictions using either image data or genomic data exclusively. In this paper, we propose a multimodal pre-training framework that jointly incorporates genomics and medical images for downstream tasks. To address the issues of high computational complexity and difficulty in capturing long-range dependencies in genes sequence modeling with MLP or Transformer architectures, we utilize Mamba to model these long genomic sequences. We aligns medical images and genes using a self-supervised contrastive learning approach which combines the Mamba as a genetic encoder and the Vision Transformer (ViT) as a medical image encoder. We pre-trained on the TCGA dataset using paired gene expression data and imaging data, and fine-tuned it for downstream tumor segmentation tasks. The results show that our model outperformed a wide range of related methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_00631
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging Zhou, Jiaying Jiang, Mingzhou Wu, Junde Zhu, Jiayuan Wang, Ziyue Jin, Yueming Computer Vision and Pattern Recognition Medicine is inherently a multimodal discipline. Medical images can reflect the pathological changes of cancer and tumors, while the expression of specific genes can influence their morphological characteristics. However, most deep learning models employed for these medical tasks are unimodal, making predictions using either image data or genomic data exclusively. In this paper, we propose a multimodal pre-training framework that jointly incorporates genomics and medical images for downstream tasks. To address the issues of high computational complexity and difficulty in capturing long-range dependencies in genes sequence modeling with MLP or Transformer architectures, we utilize Mamba to model these long genomic sequences. We aligns medical images and genes using a self-supervised contrastive learning approach which combines the Mamba as a genetic encoder and the Vision Transformer (ViT) as a medical image encoder. We pre-trained on the TCGA dataset using paired gene expression data and imaging data, and fine-tuned it for downstream tumor segmentation tasks. The results show that our model outperformed a wide range of related methods.
title	MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2406.00631

Similar Items