Saved in:
Bibliographic Details
Main Authors: Buckley, Brian, O'Hagan, Adrian, Galligan, Marie
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2304.03733
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912004008050688
author Buckley, Brian
O'Hagan, Adrian
Galligan, Marie
author_facet Buckley, Brian
O'Hagan, Adrian
Galligan, Marie
contents We investigate the performance and characteristics of currently available VB and MCMC software to explore the practicability of available approaches and provide guidance for clinical practitioners. Two case studies are used to fully explore the methods covering a variety of real-world data. First, we use the publicly available Pima Indian diabetes data to comprehensively compare VB implementations of logistic regression. Second, a large real-world data set, Optum(TM) EHR with approximately one million diabetes patients extended the analysis to large, highly unbalanced data containing discrete and continuous variables. A Bayesian patient phenotyping composite model incorporating latent class analysis (LCA) and regression was implemented with the second case study. We find that several data characteristics common in clinical data, such as sparsity, significantly affect the posterior accuracy of automatic VB methods compared with conditionally conjugate mean-field methods. We find that for both models, automatic VB approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to stopping time compared with closed-form VB methods. Our results indicate that the patient phenotyping composite Bayes model is more easily usable for real-world studies if Monte Carlo is replaced with VB. It can potentially become a uniquely useful tool for decision support, especially for rare diseases where gold-standard biomarker data is sparse but prior knowledge can be used to assist model diagnosis and may suggest when biomarker tests are warranted.
format Preprint
id arxiv_https___arxiv_org_abs_2304_03733
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Variational Bayes latent class approach for EHR-based phenotyping with large real-world data
Buckley, Brian
O'Hagan, Adrian
Galligan, Marie
Applications
We investigate the performance and characteristics of currently available VB and MCMC software to explore the practicability of available approaches and provide guidance for clinical practitioners. Two case studies are used to fully explore the methods covering a variety of real-world data. First, we use the publicly available Pima Indian diabetes data to comprehensively compare VB implementations of logistic regression. Second, a large real-world data set, Optum(TM) EHR with approximately one million diabetes patients extended the analysis to large, highly unbalanced data containing discrete and continuous variables. A Bayesian patient phenotyping composite model incorporating latent class analysis (LCA) and regression was implemented with the second case study. We find that several data characteristics common in clinical data, such as sparsity, significantly affect the posterior accuracy of automatic VB methods compared with conditionally conjugate mean-field methods. We find that for both models, automatic VB approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to stopping time compared with closed-form VB methods. Our results indicate that the patient phenotyping composite Bayes model is more easily usable for real-world studies if Monte Carlo is replaced with VB. It can potentially become a uniquely useful tool for decision support, especially for rare diseases where gold-standard biomarker data is sparse but prior knowledge can be used to assist model diagnosis and may suggest when biomarker tests are warranted.
title Variational Bayes latent class approach for EHR-based phenotyping with large real-world data
topic Applications
url https://arxiv.org/abs/2304.03733