Salvato in:
Dettagli Bibliografici
Autori principali: Wang, Yaomin, Pan, Jianting, Tian, Ran, Li, Xiaoyang, Zhang, Yu, Qin, Hengle, YU, Tianshu
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2605.06149
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866914539582259200
author Wang, Yaomin
Pan, Jianting
Tian, Ran
Li, Xiaoyang
Zhang, Yu
Qin, Hengle
YU, Tianshu
author_facet Wang, Yaomin
Pan, Jianting
Tian, Ran
Li, Xiaoyang
Zhang, Yu
Qin, Hengle
YU, Tianshu
contents The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.
format Preprint
id arxiv_https___arxiv_org_abs_2605_06149
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
Wang, Yaomin
Pan, Jianting
Tian, Ran
Li, Xiaoyang
Zhang, Yu
Qin, Hengle
YU, Tianshu
Machine Learning
Artificial Intelligence
The discount factor in reinforcement learning controls both the effective planning horizon and the strength of bootstrapping, yet most deep RL methods use a single fixed value across all states. While state-dependent discounting is conceptually appealing, naive deep actor--critic implementations can become unstable and degenerate toward TD-error collapse. We propose AdaGamma, a practical deep actor--critic method for state-dependent discounting that learns a state-dependent discount function together with a return-consistency objective to regularize the induced backup structure. On the theory side, we analyze the Bellman operator induced by state-dependent discounting and establish its basic well-posedness properties under suitable conditions. Empirically, AdaGamma integrates into both SAC and PPO, yielding consistent improvements on continuous-control benchmarks, and achieves statistically significant gains in an online A/B test on the JD Logistics platform. These results suggest that state-dependent discounting can be made effective in deep RL when coupled with a return-consistency objective that prevents degenerate target manipulation.
title AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2605.06149