Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Werge, Nicklas, Wu, Yi-Shan, Akgül, Abdullah, Kandemir, Melih
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2307.03587
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911188063879168
author	Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih
author_facet	Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih
contents	We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_03587
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih Machine Learning We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
title	Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits
topic	Machine Learning
url	https://arxiv.org/abs/2307.03587

Similar Items