MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Hong, Yunjing, Nelson, Jennifer C., Williamson, Brian D.
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Methodology Machine Learning
Accesso online:	https://arxiv.org/abs/2604.09913
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866914464794673152
author	Hong, Yunjing Nelson, Jennifer C. Williamson, Brian D.
author_facet	Hong, Yunjing Nelson, Jennifer C. Williamson, Brian D.
contents	Accurately identifying patients with specific medical conditions is a key challenge when using clinical data from electronic health records. Our objective was to comprehensively assess when weakly-supervised prediction methods, which use silver-standard labels (proxy measures of the true outcome) rather than gold-standard true labels, perform well in rare-outcome settings like vaccine safety studies. We compared three methods (PheNorm, MAP, and sureLDA) that combine structured features and features derived from clinical text using natural language processing, through an extensive simulation study with data-generating mechanisms ranging from simple to complex, varying outcome rates, and varying degrees of informative silver labels. We also considered using predicted probabilities to design a chart review validation study. No single method dominated the other across all prediction performance metrics. Probability-guided sampling selected a cohort enriched for patients with more mentions of important concepts in chart notes. SureLDA, the most complex of the three algorithms we considered, often performed well in simulations. Performance depended greatly on selected tuning parameters. Care should be taken when using weakly-supervised prediction methods in rare-outcome settings, particularly if the probabilities will be used in downstream analysis, but these methods can work well when silver labels are strong predictors of true outcomes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_09913
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings Hong, Yunjing Nelson, Jennifer C. Williamson, Brian D. Methodology Machine Learning Accurately identifying patients with specific medical conditions is a key challenge when using clinical data from electronic health records. Our objective was to comprehensively assess when weakly-supervised prediction methods, which use silver-standard labels (proxy measures of the true outcome) rather than gold-standard true labels, perform well in rare-outcome settings like vaccine safety studies. We compared three methods (PheNorm, MAP, and sureLDA) that combine structured features and features derived from clinical text using natural language processing, through an extensive simulation study with data-generating mechanisms ranging from simple to complex, varying outcome rates, and varying degrees of informative silver labels. We also considered using predicted probabilities to design a chart review validation study. No single method dominated the other across all prediction performance metrics. Probability-guided sampling selected a cohort enriched for patients with more mentions of important concepts in chart notes. SureLDA, the most complex of the three algorithms we considered, often performed well in simulations. Performance depended greatly on selected tuning parameters. Care should be taken when using weakly-supervised prediction methods in rare-outcome settings, particularly if the probabilities will be used in downstream analysis, but these methods can work well when silver labels are strong predictors of true outcomes.
title	Performance of weakly-supervised electronic health record-based phenotyping methods in rare-outcome settings
topic	Methodology Machine Learning
url	https://arxiv.org/abs/2604.09913

Documenti analoghi