Saved in:
Bibliographic Details
Main Authors: Werge, Nicklas, Wu, Yi-Shan, Akgül, Abdullah, Kandemir, Melih
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2307.03587
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911188063879168
author Werge, Nicklas
Wu, Yi-Shan
Akgül, Abdullah
Kandemir, Melih
author_facet Werge, Nicklas
Wu, Yi-Shan
Akgül, Abdullah
Kandemir, Melih
contents We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
format Preprint
id arxiv_https___arxiv_org_abs_2307_03587
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits
Werge, Nicklas
Wu, Yi-Shan
Akgül, Abdullah
Kandemir, Melih
Machine Learning
We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms.
title Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits
topic Machine Learning
url https://arxiv.org/abs/2307.03587