MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Panda, Saunak Kumar, Li, Tong, Liu, Ruiqi, Xiang, Yisha
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Artificial Intelligence 62L12, 90C40 I.2.6; G.3
Accesso online:	https://arxiv.org/abs/2603.26982
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866914429043474432
author	Panda, Saunak Kumar Li, Tong Liu, Ruiqi Xiang, Yisha
author_facet	Panda, Saunak Kumar Li, Tong Liu, Ruiqi Xiang, Yisha
contents	Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_26982
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Online Statistical Inference of Constant Sample-averaged Q-Learning Panda, Saunak Kumar Li, Tong Liu, Ruiqi Xiang, Yisha Machine Learning Artificial Intelligence 62L12, 90C40 I.2.6; G.3 Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.
title	Online Statistical Inference of Constant Sample-averaged Q-Learning
topic	Machine Learning Artificial Intelligence 62L12, 90C40 I.2.6; G.3
url	https://arxiv.org/abs/2603.26982

Documenti analoghi