Saved in:
Bibliographic Details
Main Authors: Bax, Eric, Shtoff, Alex
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.02821
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912649200009216
author Bax, Eric
Shtoff, Alex
author_facet Bax, Eric
Shtoff, Alex
contents AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping -- confirming a statistically significant difference between control and treatment as soon as possible. Another goal is to maintain some statistical capability to discover significant differences later in the test if they cannot be confirmed earlier. These goals are in conflict -- looser requirements for early stopping leave us with more stringent ones for later. This paper shows that it is impossible to maintain a constant requirement for significance for tests that have no a priori stopping time, but we can come arbitrarily close to that goal by using tests that require repeated significant results to con rm statistically significant differences between treatment and control.
format Preprint
id arxiv_https___arxiv_org_abs_2408_02821
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Steady Continuous Monitoring is (Just Barely) Impossible for Tests of Unbounded Length
Bax, Eric
Shtoff, Alex
Methodology
Machine Learning
AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping -- confirming a statistically significant difference between control and treatment as soon as possible. Another goal is to maintain some statistical capability to discover significant differences later in the test if they cannot be confirmed earlier. These goals are in conflict -- looser requirements for early stopping leave us with more stringent ones for later. This paper shows that it is impossible to maintain a constant requirement for significance for tests that have no a priori stopping time, but we can come arbitrarily close to that goal by using tests that require repeated significant results to con rm statistically significant differences between treatment and control.
title Steady Continuous Monitoring is (Just Barely) Impossible for Tests of Unbounded Length
topic Methodology
Machine Learning
url https://arxiv.org/abs/2408.02821