स्टाफ के लिए:

में बचाया:

ग्रंथसूची विवरण
मुख्य लेखकों:	Johnson, Emmeran, Pike-Burke, Ciara, Rebeschini, Patrick
स्वरूप:	Preprint
प्रकाशित:	2023
विषय:	Machine Learning Artificial Intelligence
ऑनलाइन पहुंच:	https://arxiv.org/abs/2310.01616
टैग:	टैग जोड़ें कोई टैग नहीं, इस रिकॉर्ड को टैग करने वाले पहले व्यक्ति बनें!

_version_	1866916263136067584
author	Johnson, Emmeran Pike-Burke, Ciara Rebeschini, Patrick
author_facet	Johnson, Emmeran Pike-Burke, Ciara Rebeschini, Patrick
contents	We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $Ω(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
format	Preprint
id	arxiv_https___arxiv_org_abs_2310_01616
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity Johnson, Emmeran Pike-Burke, Ciara Rebeschini, Patrick Machine Learning Artificial Intelligence We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $Ω(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
title	Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2310.01616

समान संसाधन