Saved in:
| Main Authors: | Huang, Ruiquan, Li, Donghao, Shi, Chengshuai, Shen, Cong, Yang, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.13768 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)
by: Guo, Yiduo, et al.
Published: (2025)
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
by: Li, Donghao, et al.
Published: (2026)
by: Li, Donghao, et al.
Published: (2026)
Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs
by: Huang, Ruiquan, et al.
Published: (2026)
by: Huang, Ruiquan, et al.
Published: (2026)
$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses
by: Wu, Di, et al.
Published: (2026)
by: Wu, Di, et al.
Published: (2026)
Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
by: Huang, Ruiquan, et al.
Published: (2024)
by: Huang, Ruiquan, et al.
Published: (2024)
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)
by: He, Longxiang, et al.
Published: (2025)
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)
by: Su, Jianhai, et al.
Published: (2025)
Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
by: Zu, Lipeng, et al.
Published: (2025)
by: Zu, Lipeng, et al.
Published: (2025)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL
by: Guo, Junyu, et al.
Published: (2025)
by: Guo, Junyu, et al.
Published: (2025)
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)
by: Wang, Qi, et al.
Published: (2023)
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)
by: Li, Pengyi, et al.
Published: (2025)
Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies
by: Chen, Jiaqi, et al.
Published: (2025)
by: Chen, Jiaqi, et al.
Published: (2025)
Decoupled Prioritized Resampling for Offline RL
by: Yue, Yang, et al.
Published: (2023)
by: Yue, Yang, et al.
Published: (2023)
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
by: Niu, Haoyi, et al.
Published: (2023)
by: Niu, Haoyi, et al.
Published: (2023)
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)
by: Gupta, Aaryan, et al.
Published: (2025)
Greedy Sampling Is Provably Efficient for RLHF
by: Wu, Di, et al.
Published: (2025)
by: Wu, Di, et al.
Published: (2025)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
by: Huang, Ruiquan, et al.
Published: (2023)
by: Huang, Ruiquan, et al.
Published: (2023)
Design Considerations in Offline Preference-based RL
by: Agarwal, Alekh, et al.
Published: (2025)
by: Agarwal, Alekh, et al.
Published: (2025)
Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models
by: Shi, Chengshuai, et al.
Published: (2024)
by: Shi, Chengshuai, et al.
Published: (2024)
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL
by: Luo, Qin-Wen, et al.
Published: (2024)
by: Luo, Qin-Wen, et al.
Published: (2024)
Federated Online Prediction from Experts with Differential Privacy: Separations and Regret Speed-ups
by: Gao, Fengyu, et al.
Published: (2024)
by: Gao, Fengyu, et al.
Published: (2024)
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
by: Tan, Kevin, et al.
Published: (2024)
by: Tan, Kevin, et al.
Published: (2024)
Diffusion Models as Optimizers for Efficient Planning in Offline RL
by: Huang, Renming, et al.
Published: (2024)
by: Huang, Renming, et al.
Published: (2024)
Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
by: Hu, Jifeng, et al.
Published: (2024)
by: Hu, Jifeng, et al.
Published: (2024)
Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)
by: Liu, Yao, et al.
Published: (2023)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
by: Kong, Lingxiao, et al.
Published: (2026)
by: Kong, Lingxiao, et al.
Published: (2026)
Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
by: Lazzati, Filippo, et al.
Published: (2024)
by: Lazzati, Filippo, et al.
Published: (2024)
Improving Offline RL by Blending Heuristics
by: Geng, Sinong, et al.
Published: (2023)
by: Geng, Sinong, et al.
Published: (2023)
When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
by: Töpperwien, Jan Malte, et al.
Published: (2026)
by: Töpperwien, Jan Malte, et al.
Published: (2026)
Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
On Entropy Control in LLM-RL Algorithms
by: Shen, Han
Published: (2025)
by: Shen, Han
Published: (2025)
Are Expressive Models Truly Necessary for Offline RL?
by: Wang, Guan, et al.
Published: (2024)
by: Wang, Guan, et al.
Published: (2024)
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
by: Zhao, Anhao, et al.
Published: (2026)
by: Zhao, Anhao, et al.
Published: (2026)
Efficient Prompt Optimization Through the Lens of Best Arm Identification
by: Shi, Chengshuai, et al.
Published: (2024)
by: Shi, Chengshuai, et al.
Published: (2024)
Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
Harnessing the Power of Federated Learning in Federated Contextual Bandits
by: Shi, Chengshuai, et al.
Published: (2023)
by: Shi, Chengshuai, et al.
Published: (2023)
Similar Items
-
Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025) -
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
by: Li, Donghao, et al.
Published: (2026) -
Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024) -
Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs
by: Huang, Ruiquan, et al.
Published: (2026) -
$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses
by: Wu, Di, et al.
Published: (2026)