Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Tong, Mandel, Travis, Phillips, Goldie, Rafferty, Anna, Schwartz, Eric M., Kong, Dehan, Williams, Joseph J.
Format:	Preprint
Published:	2026
Subjects:	Applications
Online Access:	https://arxiv.org/abs/2603.11267
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917348310515712
author	Li, Tong Mandel, Travis Phillips, Goldie Rafferty, Anna Schwartz, Eric M. Kong, Dehan Williams, Joseph J.
author_facet	Li, Tong Mandel, Travis Phillips, Goldie Rafferty, Anna Schwartz, Eric M. Kong, Dehan Williams, Joseph J.
contents	Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_11267
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery Li, Tong Mandel, Travis Phillips, Goldie Rafferty, Anna Schwartz, Eric M. Kong, Dehan Williams, Joseph J. Applications Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.
title	A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery
topic	Applications
url	https://arxiv.org/abs/2603.11267

Similar Items