Salvato in:
Dettagli Bibliografici
Autori principali: Panda, Saunak Kumar, Li, Tong, Liu, Ruiqi, Xiang, Yisha
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2603.26982
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866914429043474432
author Panda, Saunak Kumar
Li, Tong
Liu, Ruiqi
Xiang, Yisha
author_facet Panda, Saunak Kumar
Li, Tong
Liu, Ruiqi
Xiang, Yisha
contents Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.
format Preprint
id arxiv_https___arxiv_org_abs_2603_26982
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Online Statistical Inference of Constant Sample-averaged Q-Learning
Panda, Saunak Kumar
Li, Tong
Liu, Ruiqi
Xiang, Yisha
Machine Learning
Artificial Intelligence
62L12, 90C40
I.2.6; G.3
Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.
title Online Statistical Inference of Constant Sample-averaged Q-Learning
topic Machine Learning
Artificial Intelligence
62L12, 90C40
I.2.6; G.3
url https://arxiv.org/abs/2603.26982