Saved in:
Bibliographic Details
Main Authors: Sharma, Nihal, Basu, Soumya, Shanmugam, Karthikeyan, Shakkottai, Sanjay
Format: Preprint
Published: 2020
Subjects:
Online Access:https://arxiv.org/abs/2002.08405
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912088320901120
author Sharma, Nihal
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
author_facet Sharma, Nihal
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
contents We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.
format Preprint
id arxiv_https___arxiv_org_abs_2002_08405
institution arXiv
publishDate 2020
record_format arxiv
spellingShingle Bandits with Mean Bounds
Sharma, Nihal
Basu, Soumya
Shanmugam, Karthikeyan
Shakkottai, Sanjay
Machine Learning
We study a variant of the bandit problem where side information in the form of bounds on the mean of each arm is provided. We prove that these translate to tighter estimates of subgaussian factors and develop novel algorithms that exploit these estimates. In the linear setting, we present the Restricted-set OFUL (R-OFUL) algorithm that additionally uses the geometric properties of the problem to (potentially) restrict the set of arms being played and reduce exploration rates for suboptimal arms. In the stochastic case, we propose the non-optimistic Global Under-Explore (GLUE) algorithm which employs the inferred subgaussian estimates to adapt the rate of exploration for the arms. We analyze the regret of R-OFUL and GLUE, showing that our regret upper bounds are never worse than that of the standard OFUL and UCB algorithms respectively. Further, we also consider a practically motivated setting of learning from confounded logs where mean bounds appear naturally.
title Bandits with Mean Bounds
topic Machine Learning
url https://arxiv.org/abs/2002.08405