Saved in:
Bibliographic Details
Main Authors: Xiong, Zhihan, Fazel, Maryam, Xiao, Lin
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.01249
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909332911685632
author Xiong, Zhihan
Fazel, Maryam
Xiao, Lin
author_facet Xiong, Zhihan
Fazel, Maryam
Xiao, Lin
contents We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
format Preprint
id arxiv_https___arxiv_org_abs_2410_01249
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Dual Approximation Policy Optimization
Xiong, Zhihan
Fazel, Maryam
Xiao, Lin
Machine Learning
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
title Dual Approximation Policy Optimization
topic Machine Learning
url https://arxiv.org/abs/2410.01249