Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xiong, Zhihan, Fazel, Maryam, Xiao, Lin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.01249
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909332911685632
author	Xiong, Zhihan Fazel, Maryam Xiao, Lin
author_facet	Xiong, Zhihan Fazel, Maryam Xiao, Lin
contents	We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_01249
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Dual Approximation Policy Optimization Xiong, Zhihan Fazel, Maryam Xiao, Lin Machine Learning We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
title	Dual Approximation Policy Optimization
topic	Machine Learning
url	https://arxiv.org/abs/2410.01249

Similar Items