Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bayati, Mohsen, Hamidi, Nima, Johari, Ramesh, Khosravi, Khashayar
Format:	Preprint
Published:	2020
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2002.10121
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916167379058688
author	Bayati, Mohsen Hamidi, Nima Johari, Ramesh Khosravi, Khashayar
author_facet	Bayati, Mohsen Hamidi, Nima Johari, Ramesh Khosravi, Khashayar
contents	We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that subsampling plays a key role in designing optimal algorithms; the conventional UCB algorithm is sub-optimal, whereas a subsampled UCB (SS-UCB), which selects $Θ(\sqrt{T})$ arms for execution under the UCB framework, achieves rate-optimality. However, despite SS-UCB's theoretical promise of optimal regret, it empirically underperforms compared to a greedy algorithm that consistently chooses the empirically best arm. This observation extends to contextual settings through simulations with real-world data. Our findings suggest a new form of \emph{free exploration} beneficial to greedy algorithms in the many-armed context, fundamentally linked to a tail event concerning the prior distribution of arm rewards. This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature. Expanding upon these insights, we establish that the subsampled greedy approach not only achieves rate-optimality for Bernoulli bandits within the many-armed regime but also attains sublinear regret across broader distributions. Collectively, our research indicates that in the many-armed regime, practitioners might find greater value in adopting greedy algorithms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2002_10121
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms Bayati, Mohsen Hamidi, Nima Johari, Ramesh Khosravi, Khashayar Machine Learning We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that subsampling plays a key role in designing optimal algorithms; the conventional UCB algorithm is sub-optimal, whereas a subsampled UCB (SS-UCB), which selects $Θ(\sqrt{T})$ arms for execution under the UCB framework, achieves rate-optimality. However, despite SS-UCB's theoretical promise of optimal regret, it empirically underperforms compared to a greedy algorithm that consistently chooses the empirically best arm. This observation extends to contextual settings through simulations with real-world data. Our findings suggest a new form of \emph{free exploration} beneficial to greedy algorithms in the many-armed context, fundamentally linked to a tail event concerning the prior distribution of arm rewards. This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature. Expanding upon these insights, we establish that the subsampled greedy approach not only achieves rate-optimality for Bernoulli bandits within the many-armed regime but also attains sublinear regret across broader distributions. Collectively, our research indicates that in the many-armed regime, practitioners might find greater value in adopting greedy algorithms.
title	The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
topic	Machine Learning
url	https://arxiv.org/abs/2002.10121

Similar Items