Saved in:
Bibliographic Details
Main Authors: Kwon, Jeong Eul, Yoon, Joo Heung, Lee, Hyo Kyung
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.22121
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908560878731264
author Kwon, Jeong Eul
Yoon, Joo Heung
Lee, Hyo Kyung
author_facet Kwon, Jeong Eul
Yoon, Joo Heung
Lee, Hyo Kyung
contents Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs),where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL, a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state of the art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios where key variables are often unavailable.
format Preprint
id arxiv_https___arxiv_org_abs_2509_22121
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
Kwon, Jeong Eul
Yoon, Joo Heung
Lee, Hyo Kyung
Machine Learning
Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs),where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL, a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state of the art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios where key variables are often unavailable.
title Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
topic Machine Learning
url https://arxiv.org/abs/2509.22121