Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Akgül, Abdullah, Baykal, Gulcin, Haußmann, Manuel, Kandemir, Melih
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.01468
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917058564849664
author	Akgül, Abdullah Baykal, Gulcin Haußmann, Manuel Kandemir, Melih
author_facet	Akgül, Abdullah Baykal, Gulcin Haußmann, Manuel Kandemir, Melih
contents	Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_01468
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization Akgül, Abdullah Baykal, Gulcin Haußmann, Manuel Kandemir, Melih Machine Learning Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
title	Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
topic	Machine Learning
url	https://arxiv.org/abs/2503.01468

Similar Items