Saved in:
Bibliographic Details
Main Authors: Mota, Marco Barbero, Still, John M., Gamboa, Jorge L., Strobl, Eric V., Stein, Charles M., Kawai, Vivian K., Lasko, Thomas A.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.07206
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912185933889536
author Mota, Marco Barbero
Still, John M.
Gamboa, Jorge L.
Strobl, Eric V.
Stein, Charles M.
Kawai, Vivian K.
Lasko, Thomas A.
author_facet Mota, Marco Barbero
Still, John M.
Gamboa, Jorge L.
Strobl, Eric V.
Stein, Charles M.
Kawai, Vivian K.
Lasko, Thomas A.
contents Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.
format Preprint
id arxiv_https___arxiv_org_abs_2501_07206
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records
Mota, Marco Barbero
Still, John M.
Gamboa, Jorge L.
Strobl, Eric V.
Stein, Charles M.
Kawai, Vivian K.
Lasko, Thomas A.
Machine Learning
Applications
Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.
title A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records
topic Machine Learning
Applications
url https://arxiv.org/abs/2501.07206