Enregistré dans:
| Auteurs principaux: | , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2510.03330 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866908575021924352 |
|---|---|
| author | Wu, Andy Lin, Chun-Cheng Huang, Yuehua Liaw, Rung-Tzuo |
| author_facet | Wu, Andy Lin, Chun-Cheng Huang, Yuehua Liaw, Rung-Tzuo |
| contents | The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_03330 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Constant in an Ever-Changing World Wu, Andy Lin, Chun-Cheng Huang, Yuehua Liaw, Rung-Tzuo Machine Learning The training process of reinforcement learning often suffers from severe oscillations, leading to instability and degraded performance. In this paper, we propose a Constant in an Ever-Changing World (CIC) framework that enhances algorithmic stability to improve performance. CIC maintains both a representative policy and a current policy. Instead of updating the representative policy blindly, CIC selectively updates it only when the current policy demonstrates superiority. Furthermore, CIC employs an adaptive adjustment mechanism, enabling the representative and current policies to jointly facilitate critic training. We evaluate CIC on five MuJoCo environments, and the results show that CIC improves the performance of conventional algorithms without incurring additional computational cost. |
| title | Constant in an Ever-Changing World |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2510.03330 |