Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jeong, Woojin, Min, Seungki
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.15535
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914927185231872
author	Jeong, Woojin Min, Seungki
author_facet	Jeong, Woojin Min, Seungki
contents	We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson Sampling (BTS) offers a very effective heuristic to this problem, but its arm-selection rule does not take into account the remaining budget information. We adopt \textit{Information Relaxation Sampling} framework that generalizes Thompson Sampling for classical $K$-armed bandit problems, and propose a series of algorithms that are randomized like BTS but more carefully optimize their decisions with respect to the budget constraint. In a one-to-one correspondence with these algorithms, a series of performance benchmarks that improve the conventional benchmark are also suggested. Our theoretical analysis and simulation results show that our algorithms (and our benchmarks) make incremental improvements over BTS (respectively, the conventional benchmark) across various settings including a real-world example.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_15535
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits Jeong, Woojin Min, Seungki Machine Learning Artificial Intelligence We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson Sampling (BTS) offers a very effective heuristic to this problem, but its arm-selection rule does not take into account the remaining budget information. We adopt \textit{Information Relaxation Sampling} framework that generalizes Thompson Sampling for classical $K$-armed bandit problems, and propose a series of algorithms that are randomized like BTS but more carefully optimize their decisions with respect to the budget constraint. In a one-to-one correspondence with these algorithms, a series of performance benchmarks that improve the conventional benchmark are also suggested. Our theoretical analysis and simulation results show that our algorithms (and our benchmarks) make incremental improvements over BTS (respectively, the conventional benchmark) across various settings including a real-world example.
title	Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2408.15535

Similar Items