Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Zeyu, Hajek, Bruce, Choi, Nakjung, Walid, Anwar
Format:	Preprint
Published:	2022
Subjects:	Machine Learning Computation
Online Access:	https://arxiv.org/abs/2203.08082
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913204150468608
author	Zhou, Zeyu Hajek, Bruce Choi, Nakjung Walid, Anwar
author_facet	Zhou, Zeyu Hajek, Bruce Choi, Nakjung Walid, Anwar
contents	This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.
format	Preprint
id	arxiv_https___arxiv_org_abs_2203_08082
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Regenerative Particle Thompson Sampling Zhou, Zeyu Hajek, Bruce Choi, Nakjung Walid, Anwar Machine Learning Computation This paper proposes regenerative particle Thompson sampling (RPTS), a flexible variation of Thompson sampling. Thompson sampling itself is a Bayesian heuristic for solving stochastic bandit problems, but it is hard to implement in practice due to the intractability of maintaining a continuous posterior distribution. Particle Thompson sampling (PTS) is an approximation of Thompson sampling obtained by simply replacing the continuous distribution by a discrete distribution supported at a set of weighted static particles. We observe that in PTS, the weights of all but a few fit particles converge to zero. RPTS is based on the heuristic: delete the decaying unfit particles and regenerate new particles in the vicinity of fit surviving particles. Empirical evidence shows uniform improvement from PTS to RPTS and flexibility and efficacy of RPTS across a set of representative bandit problems, including an application to 5G network slicing.
title	Regenerative Particle Thompson Sampling
topic	Machine Learning Computation
url	https://arxiv.org/abs/2203.08082

Similar Items