Saved in:
Bibliographic Details
Main Authors: Li, Tong, Mandel, Travis, Phillips, Goldie, Rafferty, Anna, Schwartz, Eric M., Kong, Dehan, Williams, Joseph J.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.11267
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917348310515712
author Li, Tong
Mandel, Travis
Phillips, Goldie
Rafferty, Anna
Schwartz, Eric M.
Kong, Dehan
Williams, Joseph J.
author_facet Li, Tong
Mandel, Travis
Phillips, Goldie
Rafferty, Anna
Schwartz, Eric M.
Kong, Dehan
Williams, Joseph J.
contents Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.
format Preprint
id arxiv_https___arxiv_org_abs_2603_11267
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery
Li, Tong
Mandel, Travis
Phillips, Goldie
Rafferty, Anna
Schwartz, Eric M.
Kong, Dehan
Williams, Joseph J.
Applications
Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.
title A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery
topic Applications
url https://arxiv.org/abs/2603.11267