Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yingru, Liu, Liangqi, Pu, Wenqiang, Liang, Hao, Luo, Zhi-Quan
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computer Science and Game Theory
Online Access:	https://arxiv.org/abs/2402.09456
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929255907065856
author	Li, Yingru Liu, Liangqi Pu, Wenqiang Liang, Hao Luo, Zhi-Quan
author_facet	Li, Yingru Liu, Liangqi Pu, Wenqiang Liang, Hao Luo, Zhi-Quan
contents	This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We introduce Thompson Sampling (TS)-based algorithms that exploit the information of opponents' actions and reward structures, leading to a substantial reduction in experimental budgets -- achieving over tenfold improvements compared to conventional approaches. Notably, our algorithms demonstrate that, given specific reward structures, the regret bound depends logarithmically on the total action space, significantly alleviating the curse of multi-player. Furthermore, we unveil the \emph{Optimism-then-NoRegret} (OTN) framework, a pioneering methodology that seamlessly incorporates our advancements with established algorithms, showcasing its utility in practical scenarios such as traffic routing and radar sensing in the real world.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_09456
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Optimistic Thompson Sampling for No-Regret Learning in Unknown Games Li, Yingru Liu, Liangqi Pu, Wenqiang Liang, Hao Luo, Zhi-Quan Machine Learning Artificial Intelligence Computer Science and Game Theory This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We introduce Thompson Sampling (TS)-based algorithms that exploit the information of opponents' actions and reward structures, leading to a substantial reduction in experimental budgets -- achieving over tenfold improvements compared to conventional approaches. Notably, our algorithms demonstrate that, given specific reward structures, the regret bound depends logarithmically on the total action space, significantly alleviating the curse of multi-player. Furthermore, we unveil the \emph{Optimism-then-NoRegret} (OTN) framework, a pioneering methodology that seamlessly incorporates our advancements with established algorithms, showcasing its utility in practical scenarios such as traffic routing and radar sensing in the real world.
title	Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
topic	Machine Learning Artificial Intelligence Computer Science and Game Theory
url	https://arxiv.org/abs/2402.09456

Similar Items