Saved in:
Bibliographic Details
Main Authors: Hahn, Georg, Schneeweiss, Sebastian, Wang, Shirley
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.06308
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909531826552832
author Hahn, Georg
Schneeweiss, Sebastian
Wang, Shirley
author_facet Hahn, Georg
Schneeweiss, Sebastian
Wang, Shirley
contents Computable phenotypes are used to characterize patients and identify outcomes in studies conducted using healthcare claims and electronic health record data. Chart review studies establish reference labels against which computable phenotypes are compared to understand their measurement characteristics, the quantity of interest, for instance the positive predictive value. We describe a method to adaptively evaluate a quantity of interest over sequential samples of charts, with the goal to minimize the number of charts reviewed. With the help of a simultaneous confidence band, we stop the reviewing once the confidence band meets a pre-specified stopping threshold. The contribution of this article is threefold. First, we tested the use of an adaptive approach called Neyman's sampling of charts versus random or stratified random sampling. Second, we propose frequentist confidence bands and Bayesian credible intervals to sequentially evaluate the quantity of interest. Third, we propose a tool to predict the stopping time (defined as the number of charts reviewed) at which the chart review would be complete. We observe that Bayesian credible intervals proved to be tighter than its frequentist confidence band counterparts. Moreover, we observe that simple random sampling is often performing similarly to Neyman's sampling.
format Preprint
id arxiv_https___arxiv_org_abs_2503_06308
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Adaptive multi-wave sampling for efficient chart validation
Hahn, Georg
Schneeweiss, Sebastian
Wang, Shirley
Applications
Computable phenotypes are used to characterize patients and identify outcomes in studies conducted using healthcare claims and electronic health record data. Chart review studies establish reference labels against which computable phenotypes are compared to understand their measurement characteristics, the quantity of interest, for instance the positive predictive value. We describe a method to adaptively evaluate a quantity of interest over sequential samples of charts, with the goal to minimize the number of charts reviewed. With the help of a simultaneous confidence band, we stop the reviewing once the confidence band meets a pre-specified stopping threshold. The contribution of this article is threefold. First, we tested the use of an adaptive approach called Neyman's sampling of charts versus random or stratified random sampling. Second, we propose frequentist confidence bands and Bayesian credible intervals to sequentially evaluate the quantity of interest. Third, we propose a tool to predict the stopping time (defined as the number of charts reviewed) at which the chart review would be complete. We observe that Bayesian credible intervals proved to be tighter than its frequentist confidence band counterparts. Moreover, we observe that simple random sampling is often performing similarly to Neyman's sampling.
title Adaptive multi-wave sampling for efficient chart validation
topic Applications
url https://arxiv.org/abs/2503.06308