Saved in:
| Main Authors: | Tan, Kevin, Xu, Ziping |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.09701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Online learning in bandits with predicted context
by: Guo, Yongyi, et al.
Published: (2023)
by: Guo, Yongyi, et al.
Published: (2023)
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
by: Huang, Ruiquan, et al.
Published: (2025)
by: Huang, Ruiquan, et al.
Published: (2025)
Online Algorithms with Limited Data Retention
by: Immorlica, Nicole, et al.
Published: (2024)
by: Immorlica, Nicole, et al.
Published: (2024)
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026)
by: Xu, Charles, et al.
Published: (2026)
Bayesian Online Natural Gradient (BONG)
by: Jones, Matt, et al.
Published: (2024)
by: Jones, Matt, et al.
Published: (2024)
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)
by: Choi, Wonhyeok, et al.
Published: (2026)
Accelerating Transformers in Online RL
by: Zelezetsky, Daniil, et al.
Published: (2025)
by: Zelezetsky, Daniil, et al.
Published: (2025)
A Benchmark Study of Deep-RL Methods for Maximum Coverage Problems over Graphs
by: Liang, Zhicheng, et al.
Published: (2024)
by: Liang, Zhicheng, et al.
Published: (2024)
The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting
by: Xu, Ziping, et al.
Published: (2024)
by: Xu, Ziping, et al.
Published: (2024)
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)
by: He, Longxiang, et al.
Published: (2025)
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
by: Xu, Ziping, et al.
Published: (2024)
by: Xu, Ziping, et al.
Published: (2024)
$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
by: Chen, Kang, et al.
Published: (2025)
by: Chen, Kang, et al.
Published: (2025)
Coverage-Validity-Aware Algorithmic Recourse
by: Bui, Ngoc, et al.
Published: (2023)
by: Bui, Ngoc, et al.
Published: (2023)
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
by: Niu, Haoyi, et al.
Published: (2023)
by: Niu, Haoyi, et al.
Published: (2023)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
Reveal the Mystery of DPO: The Connection between DPO and RL Algorithms
by: Su, Xuerui, et al.
Published: (2025)
by: Su, Xuerui, et al.
Published: (2025)
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
by: Zhou, Runlong, et al.
Published: (2024)
by: Zhou, Runlong, et al.
Published: (2024)
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)
by: Su, Jianhai, et al.
Published: (2025)
Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs
by: Tan, Kevin, et al.
Published: (2024)
by: Tan, Kevin, et al.
Published: (2024)
Generalized Linear Markov Decision Process
by: Zhang, Sinian, et al.
Published: (2025)
by: Zhang, Sinian, et al.
Published: (2025)
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
by: Chen, Fan, et al.
Published: (2025)
by: Chen, Fan, et al.
Published: (2025)
Training-Conditional Coverage Bounds for Uniformly Stable Learning Algorithms
by: Pournaderi, Mehrdad, et al.
Published: (2024)
by: Pournaderi, Mehrdad, et al.
Published: (2024)
Enhancing Adversarial Example Detection Through Model Explanation
by: Ma, Qian, et al.
Published: (2025)
by: Ma, Qian, et al.
Published: (2025)
On Entropy Control in LLM-RL Algorithms
by: Shen, Han
Published: (2025)
by: Shen, Han
Published: (2025)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
Active Measuring in Reinforcement Learning With Delayed Negative Effects
by: Gao, Daiqi, et al.
Published: (2025)
by: Gao, Daiqi, et al.
Published: (2025)
On the Limits of Tabular Hardness Metrics for Deep RL: A Study with the Pharos Benchmark
by: Conserva, Michelangelo, et al.
Published: (2025)
by: Conserva, Michelangelo, et al.
Published: (2025)
RL's Razor: Why Online Reinforcement Learning Forgets Less
by: Shenfeld, Idan, et al.
Published: (2025)
by: Shenfeld, Idan, et al.
Published: (2025)
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)
by: Gupta, Aaryan, et al.
Published: (2025)
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
by: Zu, Lipeng, et al.
Published: (2025)
by: Zu, Lipeng, et al.
Published: (2025)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
Natural Policy Gradient for Average Reward Non-Stationary RL
by: Jali, Neharika, et al.
Published: (2025)
by: Jali, Neharika, et al.
Published: (2025)
Curiosity-driven RL for symbolic equation solving
by: O'Keeffe, Kevin P.
Published: (2025)
by: O'Keeffe, Kevin P.
Published: (2025)
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
by: Zurek, Matthew, et al.
Published: (2025)
by: Zurek, Matthew, et al.
Published: (2025)
A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models
by: Tan, Zhiquan, et al.
Published: (2025)
by: Tan, Zhiquan, et al.
Published: (2025)
Hybrid Deep Learning Modeling Approach to Predict Natural Gas Consumption of Home Subscribers on Limited Data
by: Firoozeh, Milad, et al.
Published: (2025)
by: Firoozeh, Milad, et al.
Published: (2025)
RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
Online Finetuning Decision Transformers with Pure RL Gradients
by: Luo, Junkai, et al.
Published: (2026)
by: Luo, Junkai, et al.
Published: (2026)
Accelerating Goal-Conditioned RL Algorithms and Research
by: Bortkiewicz, Michał, et al.
Published: (2024)
by: Bortkiewicz, Michał, et al.
Published: (2024)
Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)
by: Anjarlekar, Ameya, et al.
Published: (2025)
Similar Items
-
Online learning in bandits with predicted context
by: Guo, Yongyi, et al.
Published: (2023) -
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis
by: Huang, Ruiquan, et al.
Published: (2025) -
Online Algorithms with Limited Data Retention
by: Immorlica, Nicole, et al.
Published: (2024) -
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026) -
Bayesian Online Natural Gradient (BONG)
by: Jones, Matt, et al.
Published: (2024)