Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mohsin, Md Talha, Abdulrashid, Ismail
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2510.01899
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908716612190208
author	Mohsin, Md Talha Abdulrashid, Ismail
author_facet	Mohsin, Md Talha Abdulrashid, Ismail
contents	Healthcare data now span EHRs, medical imaging, genomics, and wearable sensors, but most diagnostic models still process these modalities in isolation. This limits their ability to capture early, cross-modal disease signatures. This paper introduces a multimodal foundation model built on a transformer architecture that integrates heterogeneous clinical data through modality-specific encoders and cross-modal attention. Each modality is mapped into a shared latent space and fused using multi-head attention with residual normalization. We implement the framework using a multimodal dataset that simulates early-stage disease patterns across EHR sequences, imaging patches, genomic profiles, and wearable signals, including missing-modality scenarios and label noise. The model is trained using supervised classification together with self-supervised reconstruction and contrastive alignment to improve robustness. Experimental evaluation demonstrates strong performance in early-detection settings, with stable classification metrics, reliable uncertainty estimates, and interpretable attention patterns. The approach moves toward a flexible, pretrain-and-fine-tune foundation model that supports precision diagnostics, handles incomplete inputs, and improves early disease detection across oncology, cardiology, and neurology applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_01899
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Multimodal Foundation Models for Early Disease Detection Mohsin, Md Talha Abdulrashid, Ismail Machine Learning Artificial Intelligence Human-Computer Interaction Healthcare data now span EHRs, medical imaging, genomics, and wearable sensors, but most diagnostic models still process these modalities in isolation. This limits their ability to capture early, cross-modal disease signatures. This paper introduces a multimodal foundation model built on a transformer architecture that integrates heterogeneous clinical data through modality-specific encoders and cross-modal attention. Each modality is mapped into a shared latent space and fused using multi-head attention with residual normalization. We implement the framework using a multimodal dataset that simulates early-stage disease patterns across EHR sequences, imaging patches, genomic profiles, and wearable signals, including missing-modality scenarios and label noise. The model is trained using supervised classification together with self-supervised reconstruction and contrastive alignment to improve robustness. Experimental evaluation demonstrates strong performance in early-detection settings, with stable classification metrics, reliable uncertainty estimates, and interpretable attention patterns. The approach moves toward a flexible, pretrain-and-fine-tune foundation model that supports precision diagnostics, handles incomplete inputs, and improves early disease detection across oncology, cardiology, and neurology applications.
title	Multimodal Foundation Models for Early Disease Detection
topic	Machine Learning Artificial Intelligence Human-Computer Interaction
url	https://arxiv.org/abs/2510.01899

Similar Items