Salvato in:
| Autori principali: | , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2022
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2210.09921 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866929224499068928 |
|---|---|
| author | Chen, Xuyang Zhao, Lin |
| author_facet | Chen, Xuyang Zhao, Lin |
| contents | Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single-timescale form. Existing works on analyzing single-timescale actor-critic have been limited to i.i.d. sampling or tabular setting for simplicity. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space, where the critic assumes linear function approximation and updates with a single Markovian sample per actor step. Previous analysis has been unable to establish the convergence for such a challenging scenario. We demonstrate that the online single-timescale actor-critic method provably finds an $ε$-approximate stationary point with $\widetilde{\mathcal{O}}(ε^{-2})$ sample complexity under standard assumptions, which can be further improved to $\mathcal{O}(ε^{-2})$ under the i.i.d. sampling. Our novel framework systematically evaluates and controls the error propagation between the actor and critic. It offers a promising approach for analyzing other single-timescale reinforcement learning algorithms as well. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2210_09921 |
| institution | arXiv |
| publishDate | 2022 |
| record_format | arxiv |
| spellingShingle | Finite-time analysis of single-timescale actor-critic Chen, Xuyang Zhao, Lin Machine Learning Optimization and Control Actor-critic methods have achieved significant success in many challenging applications. However, its finite-time convergence is still poorly understood in the most practical single-timescale form. Existing works on analyzing single-timescale actor-critic have been limited to i.i.d. sampling or tabular setting for simplicity. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space, where the critic assumes linear function approximation and updates with a single Markovian sample per actor step. Previous analysis has been unable to establish the convergence for such a challenging scenario. We demonstrate that the online single-timescale actor-critic method provably finds an $ε$-approximate stationary point with $\widetilde{\mathcal{O}}(ε^{-2})$ sample complexity under standard assumptions, which can be further improved to $\mathcal{O}(ε^{-2})$ under the i.i.d. sampling. Our novel framework systematically evaluates and controls the error propagation between the actor and critic. It offers a promising approach for analyzing other single-timescale reinforcement learning algorithms as well. |
| title | Finite-time analysis of single-timescale actor-critic |
| topic | Machine Learning Optimization and Control |
| url | https://arxiv.org/abs/2210.09921 |