Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sharma, Nihal, Basu, Soumya, Shanmugam, Karthikeyan, Shakkottai, Sanjay
Format:	Preprint
Published:	2020
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2002.08405
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912088320901120
author	Sharma, Nihal Basu, Soumya Shanmugam, Karthikeyan Shakkottai, Sanjay
author_facet	Sharma, Nihal Basu, Soumya Shanmugam, Karthikeyan Shakkottai, Sanjay
contents	We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.
format	Preprint
id	arxiv_https___arxiv_org_abs_2002_08405
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Bandits with Mean Bounds Sharma, Nihal Basu, Soumya Shanmugam, Karthikeyan Shakkottai, Sanjay Machine Learning We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.
title	Bandits with Mean Bounds
topic	Machine Learning
url	https://arxiv.org/abs/2002.08405

Similar Items