Saved in:
Bibliographic Details
Main Authors: Mohsin, Md Talha, Abdulrashid, Ismail
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.01899
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908716612190208
author Mohsin, Md Talha
Abdulrashid, Ismail
author_facet Mohsin, Md Talha
Abdulrashid, Ismail
contents Healthcare data now span EHRs, medical imaging, genomics, and wearable sensors, but most diagnostic models still process these modalities in isolation. This limits their ability to capture early, cross-modal disease signatures. This paper introduces a multimodal foundation model built on a transformer architecture that integrates heterogeneous clinical data through modality-specific encoders and cross-modal attention. Each modality is mapped into a shared latent space and fused using multi-head attention with residual normalization. We implement the framework using a multimodal dataset that simulates early-stage disease patterns across EHR sequences, imaging patches, genomic profiles, and wearable signals, including missing-modality scenarios and label noise. The model is trained using supervised classification together with self-supervised reconstruction and contrastive alignment to improve robustness. Experimental evaluation demonstrates strong performance in early-detection settings, with stable classification metrics, reliable uncertainty estimates, and interpretable attention patterns. The approach moves toward a flexible, pretrain-and-fine-tune foundation model that supports precision diagnostics, handles incomplete inputs, and improves early disease detection across oncology, cardiology, and neurology applications.
format Preprint
id arxiv_https___arxiv_org_abs_2510_01899
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Multimodal Foundation Models for Early Disease Detection
Mohsin, Md Talha
Abdulrashid, Ismail
Machine Learning
Artificial Intelligence
Human-Computer Interaction
Healthcare data now span EHRs, medical imaging, genomics, and wearable sensors, but most diagnostic models still process these modalities in isolation. This limits their ability to capture early, cross-modal disease signatures. This paper introduces a multimodal foundation model built on a transformer architecture that integrates heterogeneous clinical data through modality-specific encoders and cross-modal attention. Each modality is mapped into a shared latent space and fused using multi-head attention with residual normalization. We implement the framework using a multimodal dataset that simulates early-stage disease patterns across EHR sequences, imaging patches, genomic profiles, and wearable signals, including missing-modality scenarios and label noise. The model is trained using supervised classification together with self-supervised reconstruction and contrastive alignment to improve robustness. Experimental evaluation demonstrates strong performance in early-detection settings, with stable classification metrics, reliable uncertainty estimates, and interpretable attention patterns. The approach moves toward a flexible, pretrain-and-fine-tune foundation model that supports precision diagnostics, handles incomplete inputs, and improves early disease detection across oncology, cardiology, and neurology applications.
title Multimodal Foundation Models for Early Disease Detection
topic Machine Learning
Artificial Intelligence
Human-Computer Interaction
url https://arxiv.org/abs/2510.01899