Saved in:
Bibliographic Details
Main Authors: Yuan, Weixuan, Jin, Zengrui, Wang, Yichen, Xie, Donglin, Ye, Ziyi, Zhang, Chao, Chen, Xuesong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.13857
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911449530499072
author Yuan, Weixuan
Jin, Zengrui
Wang, Yichen
Xie, Donglin
Ye, Ziyi
Zhang, Chao
Chen, Xuesong
author_facet Yuan, Weixuan
Jin, Zengrui
Wang, Yichen
Xie, Donglin
Ye, Ziyi
Zhang, Chao
Chen, Xuesong
contents Tasks ranging from sleep staging to clinical diagnosis traditionally rely on standard polysomnography (PSG) devices, bedside monitors and wearable devices, which capture diverse nocturnal biosignals (e.g., EEG, EOG, ECG, SpO$_2$). However, heterogeneity across devices and frequent sensor dropout pose significant challenges for unified modelling of these multimodal signals. We present \texttt{sleep2vec}, a foundation model for diverse and incomplete nocturnal biosignals that learns a shared representation via cross-modal alignment. \texttt{sleep2vec} is contrastively pre-trained on 42,249 overnight recordings spanning nine modalities using a \textit{Demography, Age, Site \& History-aware InfoNCE} objective that incorporates physiological and acquisition metadata (\textit{e.g.}, age, gender, recording site) to dynamically weight negatives and mitigate cohort-specific shortcuts. On downstream sleep staging and clinical outcome assessment, \texttt{sleep2vec} consistently outperforms strong baselines and remains robust to any subset of available modalities and sensor dropout. We further characterize, to our knowledge for the first time, scaling laws for nocturnal biosignals with respect to modality diversity and model capacity. Together, these results show that unified cross-modal alignment, coupled with principled scaling, enables label-efficient, general-purpose modelling of real-world nocturnal biosignals.
format Preprint
id arxiv_https___arxiv_org_abs_2602_13857
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
Yuan, Weixuan
Jin, Zengrui
Wang, Yichen
Xie, Donglin
Ye, Ziyi
Zhang, Chao
Chen, Xuesong
Machine Learning
Signal Processing
Tasks ranging from sleep staging to clinical diagnosis traditionally rely on standard polysomnography (PSG) devices, bedside monitors and wearable devices, which capture diverse nocturnal biosignals (e.g., EEG, EOG, ECG, SpO$_2$). However, heterogeneity across devices and frequent sensor dropout pose significant challenges for unified modelling of these multimodal signals. We present \texttt{sleep2vec}, a foundation model for diverse and incomplete nocturnal biosignals that learns a shared representation via cross-modal alignment. \texttt{sleep2vec} is contrastively pre-trained on 42,249 overnight recordings spanning nine modalities using a \textit{Demography, Age, Site \& History-aware InfoNCE} objective that incorporates physiological and acquisition metadata (\textit{e.g.}, age, gender, recording site) to dynamically weight negatives and mitigate cohort-specific shortcuts. On downstream sleep staging and clinical outcome assessment, \texttt{sleep2vec} consistently outperforms strong baselines and remains robust to any subset of available modalities and sensor dropout. We further characterize, to our knowledge for the first time, scaling laws for nocturnal biosignals with respect to modality diversity and model capacity. Together, these results show that unified cross-modal alignment, coupled with principled scaling, enables label-efficient, general-purpose modelling of real-world nocturnal biosignals.
title sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
topic Machine Learning
Signal Processing
url https://arxiv.org/abs/2602.13857