Saved in:
Bibliographic Details
Main Authors: Zhou, Zeyu, Hajek, Bruce, Choi, Nakjung, Walid, Anwar
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2203.08082
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913204150468608
author Zhou, Zeyu
Hajek, Bruce
Choi, Nakjung
Walid, Anwar
author_facet Zhou, Zeyu
Hajek, Bruce
Choi, Nakjung
Walid, Anwar
contents This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.
format Preprint
id arxiv_https___arxiv_org_abs_2203_08082
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Regenerative Particle Thompson Sampling
Zhou, Zeyu
Hajek, Bruce
Choi, Nakjung
Walid, Anwar
Machine Learning
Computation
This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.
title Regenerative Particle Thompson Sampling
topic Machine Learning
Computation
url https://arxiv.org/abs/2203.08082