Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Donâncio, Henrique, Barrier, Antoine, South, Leah F., Forbes, Florence
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2410.12598
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908581470666752
author	Donâncio, Henrique Barrier, Antoine South, Leah F. Forbes, Florence
author_facet	Donâncio, Henrique Barrier, Antoine South, Leah F. Forbes, Florence
contents	In deep Reinforcement Learning (RL), the learning rate critically influences both stability and performance, yet its optimal value shifts during training as the environment and policy evolve. Standard decay schedulers assume monotonic convergence and often misalign with these dynamics, leading to premature or delayed adjustments. We introduce LRRL, a meta-learning approach that dynamically selects the learning rate based on policy performance rather than training steps. LRRL adaptively favors rates that improve returns, remaining robust even when the candidate set includes values that individually cause divergence. Across Atari and MuJoCo benchmarks, LRRL achieves performance competitive with or superior to tuned baselines and standard schedulers. Our findings position LRRL as a practical solution for adapting to non-stationary objectives in deep RL.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_12598
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach Donâncio, Henrique Barrier, Antoine South, Leah F. Forbes, Florence Machine Learning In deep Reinforcement Learning (RL), the learning rate critically influences both stability and performance, yet its optimal value shifts during training as the environment and policy evolve. Standard decay schedulers assume monotonic convergence and often misalign with these dynamics, leading to premature or delayed adjustments. We introduce LRRL, a meta-learning approach that dynamically selects the learning rate based on policy performance rather than training steps. LRRL adaptively favors rates that improve returns, remaining robust even when the candidate set includes values that individually cause divergence. Across Atari and MuJoCo benchmarks, LRRL achieves performance competitive with or superior to tuned baselines and standard schedulers. Our findings position LRRL as a practical solution for adapting to non-stationary objectives in deep RL.
title	Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach
topic	Machine Learning
url	https://arxiv.org/abs/2410.12598

Ähnliche Einträge