Saved in:
Bibliographic Details
Main Authors: Jeong, Woojin, Min, Seungki
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.15535
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914927185231872
author Jeong, Woojin
Min, Seungki
author_facet Jeong, Woojin
Min, Seungki
contents We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson Sampling (BTS) offers a very effective heuristic to this problem, but its arm-selection rule does not take into account the remaining budget information. We adopt \textit{Information Relaxation Sampling} framework that generalizes Thompson Sampling for classical $K$-armed bandit problems, and propose a series of algorithms that are randomized like BTS but more carefully optimize their decisions with respect to the budget constraint. In a one-to-one correspondence with these algorithms, a series of performance benchmarks that improve the conventional benchmark are also suggested. Our theoretical analysis and simulation results show that our algorithms (and our benchmarks) make incremental improvements over BTS (respectively, the conventional benchmark) across various settings including a real-world example.
format Preprint
id arxiv_https___arxiv_org_abs_2408_15535
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
Jeong, Woojin
Min, Seungki
Machine Learning
Artificial Intelligence
We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson Sampling (BTS) offers a very effective heuristic to this problem, but its arm-selection rule does not take into account the remaining budget information. We adopt \textit{Information Relaxation Sampling} framework that generalizes Thompson Sampling for classical $K$-armed bandit problems, and propose a series of algorithms that are randomized like BTS but more carefully optimize their decisions with respect to the budget constraint. In a one-to-one correspondence with these algorithms, a series of performance benchmarks that improve the conventional benchmark are also suggested. Our theoretical analysis and simulation results show that our algorithms (and our benchmarks) make incremental improvements over BTS (respectively, the conventional benchmark) across various settings including a real-world example.
title Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2408.15535