Salvato in:
| Autori principali: | , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2025
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2505.13768 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866912455550042112 |
|---|---|
| author | Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing |
| author_facet | Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing |
| contents | This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^*|ρ)+N_1}) )$, where $\mathtt{C}(π^*|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs). |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_13768 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis Huang, Ruiquan Li, Donghao Shi, Chengshuai Shen, Cong Yang, Jing Machine Learning This paper investigates a hybrid learning framework for reinforcement learning (RL) in which the agent can leverage both an offline dataset and online interactions to learn the optimal policy. We present a unified algorithm and analysis and show that augmenting confidence-based online RL algorithms with the offline dataset outperforms any pure online or offline algorithm alone and achieves state-of-the-art results under two learning metrics, i.e., sub-optimality gap and online learning regret. Specifically, we show that our algorithm achieves a sub-optimality gap $\tilde{O}(\sqrt{1/(N_0/\mathtt{C}(π^*|ρ)+N_1}) )$, where $\mathtt{C}(π^*|ρ)$ is a new concentrability coefficient, $N_0$ and $N_1$ are the numbers of offline and online samples, respectively. For regret minimization, we show that it achieves a constant $\tilde{O}( \sqrt{N_1/(N_0/\mathtt{C}(π^{-}|ρ)+N_1)} )$ speed-up compared to pure online learning, where $\mathtt{C}(π^-|ρ)$ is the concentrability coefficient over all sub-optimal policies. Our results also reveal an interesting separation on the desired coverage properties of the offline dataset for sub-optimality gap minimization and regret minimization. We further validate our theoretical findings in several experiments in special RL models such as linear contextual bandits and Markov decision processes (MDPs). |
| title | Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2505.13768 |