Gespeichert in:
| Hauptverfasser: | , , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2025
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2502.02859 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866911161328336896 |
|---|---|
| author | Zhang, Haochen Zheng, Zhong Xue, Lingzhou |
| author_facet | Zhang, Haochen Zheng, Zhong Xue, Lingzhou |
| contents | We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $\log T$-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $MSA$ from the $\log T$ term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when $M=1$, removing $SA$ from the $\log T$ term. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_02859 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Gap-Dependent Bounds for Federated $Q$-learning Zhang, Haochen Zheng, Zhong Xue, Lingzhou Machine Learning We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $\log T$-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $MSA$ from the $\log T$ term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when $M=1$, removing $SA$ from the $\log T$ term. |
| title | Gap-Dependent Bounds for Federated $Q$-learning |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2502.02859 |