Saved in:
Bibliographic Details
Main Authors: Tsai, Pi-Ju, Limbud, Charkkri, Chen, Kuan-Fu, Tseng, Yi-Ju
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.03562
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908977414012928
author Tsai, Pi-Ju
Limbud, Charkkri
Chen, Kuan-Fu
Tseng, Yi-Ju
author_facet Tsai, Pi-Ju
Limbud, Charkkri
Chen, Kuan-Fu
Tseng, Yi-Ju
contents Electronic Health Records (EHRs) provide high-dimensional temporal data essential for patient modeling; however, conventional algorithmic approaches often rely on data aggregation or imputation, which distorts temporal disease trajectories. Such computational limitations are particularly critical in sepsis, a heterogeneous syndrome where clustering-based stratification plays a key role in identifying clinically distinct phenotypes for precise treatment strategies. Furthermore, existing clustering processes seldom incorporate domain-driven constraints, often resulting in phenotypes that lack clear clinical distinction. We propose a novel clustering network, NPCNet, that comprises a text embedding generator, a clustering operator, and a target navigator. We first transform EHRs into pseudo texts by discretizing continuous clinical measurements, then integrate them with static variables to construct the embeddings. The target navigator then infuses clinical knowledge into the embeddings through auxiliary tasks, constraining clustering results to better align sepsis phenotypes with clinical significance. Finally, the clustering operator employs an iterative refinement mechanism to jointly optimize phenotype centroids and patient representations under domain-driven constraints. Extensive experiments on public datasets validate that NPCNet achieves superior performance on both internal clustering benchmarks and clinical validity metrics, offering a viable pathway for precision treatment strategies in the management of sepsis.
format Preprint
id arxiv_https___arxiv_org_abs_2602_03562
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis Phenotyping
Tsai, Pi-Ju
Limbud, Charkkri
Chen, Kuan-Fu
Tseng, Yi-Ju
Machine Learning
Electronic Health Records (EHRs) provide high-dimensional temporal data essential for patient modeling; however, conventional algorithmic approaches often rely on data aggregation or imputation, which distorts temporal disease trajectories. Such computational limitations are particularly critical in sepsis, a heterogeneous syndrome where clustering-based stratification plays a key role in identifying clinically distinct phenotypes for precise treatment strategies. Furthermore, existing clustering processes seldom incorporate domain-driven constraints, often resulting in phenotypes that lack clear clinical distinction. We propose a novel clustering network, NPCNet, that comprises a text embedding generator, a clustering operator, and a target navigator. We first transform EHRs into pseudo texts by discretizing continuous clinical measurements, then integrate them with static variables to construct the embeddings. The target navigator then infuses clinical knowledge into the embeddings through auxiliary tasks, constraining clustering results to better align sepsis phenotypes with clinical significance. Finally, the clustering operator employs an iterative refinement mechanism to jointly optimize phenotype centroids and patient representations under domain-driven constraints. Extensive experiments on public datasets validate that NPCNet achieves superior performance on both internal clustering benchmarks and clinical validity metrics, offering a viable pathway for precision treatment strategies in the management of sepsis.
title NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis Phenotyping
topic Machine Learning
url https://arxiv.org/abs/2602.03562