MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Huang, Ruiquan, Li, Donghao, Shi, Chengshuai, Shen, Cong, Yang, Jing
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2505.13768
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912455550042112
author	Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing
author_facet	Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing
contents	This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^\|ρ)+N_1}) )$, where $\mathtt{C}(π^\|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}\|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-\|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_13768
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing Machine Learning This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^\|ρ)+N_1}) )$, where $\mathtt{C}(π^\|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}\|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-\|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs).
title	Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
topic	Machine Learning
url	https://arxiv.org/abs/2505.13768

Documenti analoghi