Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.03587 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911188063879168 |
|---|---|
| author | Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih |
| author_facet | Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih |
| contents | We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2307_03587 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits Werge, Nicklas Wu, Yi-Shan Akgül, Abdullah Kandemir, Melih Machine Learning We study non-stationary linear contextual bandits through the lens of sequential Bayesian inference. Whereas existing algorithms typically rely on the Weighted Regularized Least-Squares (WRLS) objective, we study Weighted Sequential Bayesian (WSB), which maintains a posterior distribution over the time-varying reward parameters. Our main contribution is a novel concentration inequality for WSB posteriors, which introduces a prior-dependent term that quantifies the influence of initial beliefs. We show that this influence decays over time and derive tractable upper bounds that make the result useful for both analysis and algorithm design. Building on WSB, we introduce three algorithms: WSB-LinUCB, WSB-RandLinUCB, and WSB-LinTS. We establish frequentist regret guarantees: WSB-LinUCB matches the best-known WRLS-based guarantees, while WSB-RandLinUCB and WSB-LinTS improve upon them, all while preserving the computational efficiency of WRLS-based algorithms. |
| title | Weighted Sequential Bayesian Inference for Non-Stationary Linear Contextual Bandits |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2307.03587 |