Saved in:
Bibliographic Details
Main Authors: Akgül, Abdullah, Baykal, Gulcin, Haußmann, Manuel, Kandemir, Melih
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.01468
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917058564849664
author Akgül, Abdullah
Baykal, Gulcin
Haußmann, Manuel
Kandemir, Melih
author_facet Akgül, Abdullah
Baykal, Gulcin
Haußmann, Manuel
Kandemir, Melih
contents Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
format Preprint
id arxiv_https___arxiv_org_abs_2503_01468
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
Akgül, Abdullah
Baykal, Gulcin
Haußmann, Manuel
Kandemir, Melih
Machine Learning
Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
title Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
topic Machine Learning
url https://arxiv.org/abs/2503.01468