Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cohen, Asaf, He, Ruolan, Wang, Yuqiong
Format:	Preprint
Published:	2026
Subjects:	Optimization and Control
Online Access:	https://arxiv.org/abs/2601.20973
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

We study a stochastic differential game with $N$ competitive players in a linear-quadratic framework with ergodic cost, where $d$-dimensional diffusion processes govern the state dynamics with an unknown common drift (matrix). Assuming a Gaussian prior on the drift, we use filtering techniques to update its posterior estimates. Based on these estimates, we propose a Thompson-sampling-based algorithm with dynamic episode lengths to approximate strategies. We show that the Bayesian regret for each player has an error bound of order $O(\sqrt{T\log(T)})$, where $T$ is the time-horizon, independent of the number of players. This implies that average regret per unit time goes to zero. Finally, we prove that the algorithm results in a Nash equilibrium.

Similar Items