Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.01249 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909332911685632 |
|---|---|
| author | Xiong, Zhihan Fazel, Maryam Xiao, Lin |
| author_facet | Xiong, Zhihan Fazel, Maryam Xiao, Lin |
| contents | We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_01249 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Dual Approximation Policy Optimization Xiong, Zhihan Fazel, Maryam Xiao, Lin Machine Learning We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees. |
| title | Dual Approximation Policy Optimization |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2410.01249 |