Enregistré dans:
Détails bibliographiques
Auteurs principaux: Li, Yunxiang, Yuan, Rui, Fan, Chen, Schmidt, Mark, Horváth, Samuel, Gower, Robert M., Takáč, Martin
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2404.07525
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866914748936749056
author Li, Yunxiang
Yuan, Rui
Fan, Chen
Schmidt, Mark
Horváth, Samuel
Gower, Robert M.
Takáč, Martin
author_facet Li, Yunxiang
Yuan, Rui
Fan, Chen
Schmidt, Mark
Horváth, Samuel
Gower, Robert M.
Takáč, Martin
contents Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically adjusts the step-size without prior knowledge. To adapt this method to RL settings, we address several issues, including unknown f* in the Polyak step-size. Additionally, we showcase the performance of the Polyak step-size in RL through experiments, demonstrating faster convergence and the attainment of more stable policies.
format Preprint
id arxiv_https___arxiv_org_abs_2404_07525
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Enhancing Policy Gradient with the Polyak Step-Size Adaption
Li, Yunxiang
Yuan, Rui
Fan, Chen
Schmidt, Mark
Horváth, Samuel
Gower, Robert M.
Takáč, Martin
Machine Learning
Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically adjusts the step-size without prior knowledge. To adapt this method to RL settings, we address several issues, including unknown f* in the Polyak step-size. Additionally, we showcase the performance of the Polyak step-size in RL through experiments, demonstrating faster convergence and the attainment of more stable policies.
title Enhancing Policy Gradient with the Polyak Step-Size Adaption
topic Machine Learning
url https://arxiv.org/abs/2404.07525