MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Yaomin, Pan, Jianting, Tian, Ran, Li, Xiaoyang, Zhang, Yu, Qin, Hengle, YU, Tianshu
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2605.06149
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866914539582259200
author	Wang, Yaomin Pan, Jianting Tian, Ran Li, Xiaoyang Zhang, Yu Qin, Hengle YU, Tianshu
author_facet	Wang, Yaomin Pan, Jianting Tian, Ran Li, Xiaoyang Zhang, Yu Qin, Hengle YU, Tianshu
contents	The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_06149
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning Wang, Yaomin Pan, Jianting Tian, Ran Li, Xiaoyang Zhang, Yu Qin, Hengle YU, Tianshu Machine Learning Artificial Intelligence The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.
title	AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2605.06149

Documenti analoghi