Enregistré dans:
Détails bibliographiques
Auteurs principaux: Wu, Andy, Lin, Chun-Cheng, Huang, Yuehua, Liaw, Rung-Tzuo
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2510.03330
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866908575021924352
author Wu, Andy
Lin, Chun-Cheng
Huang, Yuehua
Liaw, Rung-Tzuo
author_facet Wu, Andy
Lin, Chun-Cheng
Huang, Yuehua
Liaw, Rung-Tzuo
contents The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost.
format Preprint
id arxiv_https___arxiv_org_abs_2510_03330
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Constant in an Ever-Changing World
Wu, Andy
Lin, Chun-Cheng
Huang, Yuehua
Liaw, Rung-Tzuo
Machine Learning
The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost.
title Constant in an Ever-Changing World
topic Machine Learning
url https://arxiv.org/abs/2510.03330