:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Ruiquan, Li, Donghao, Shi, Chengshuai, Shen, Cong, Yang, Jing
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.13768
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Synthetic Data RL: Task Definition Is All You Need
by: Guo, Yiduo, et al.
Published: (2025)

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
by: Li, Donghao, et al.
Published: (2026)

Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024)

Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs
by: Huang, Ruiquan, et al.
Published: (2026)

$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses
by: Wu, Di, et al.
Published: (2026)

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
by: Huang, Ruiquan, et al.
Published: (2024)

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)

Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL
by: Zu, Lipeng, et al.
Published: (2025)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL
by: Guo, Junyu, et al.
Published: (2025)

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)

Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies
by: Chen, Jiaqi, et al.
Published: (2025)

Decoupled Prioritized Resampling for Offline RL
by: Yue, Yang, et al.
Published: (2023)

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
by: Niu, Haoyi, et al.
Published: (2023)

Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)

Greedy Sampling Is Provably Efficient for RLHF
by: Wu, Di, et al.
Published: (2025)

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
by: Huang, Ruiquan, et al.
Published: (2023)

Design Considerations in Offline Preference-based RL
by: Agarwal, Alekh, et al.
Published: (2025)

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models
by: Shi, Chengshuai, et al.
Published: (2024)

Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL
by: Luo, Qin-Wen, et al.
Published: (2024)

Federated Online Prediction from Experts with Differential Privacy: Separations and Regret Speed-ups
by: Gao, Fengyu, et al.
Published: (2024)

A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
by: Tan, Kevin, et al.
Published: (2024)

Diffusion Models as Optimizers for Efficient Planning in Offline RL
by: Huang, Renming, et al.
Published: (2024)

Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
by: Hu, Jifeng, et al.
Published: (2024)

Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
by: Kong, Lingxiao, et al.
Published: (2026)

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
by: Lazzati, Filippo, et al.
Published: (2024)

Improving Offline RL by Blending Heuristics
by: Geng, Sinong, et al.
Published: (2023)

When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
by: Töpperwien, Jan Malte, et al.
Published: (2026)

Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)

ToolRL: Reward is All Tool Learning Needs
by: Qian, Cheng, et al.
Published: (2025)

On Entropy Control in LLM-RL Algorithms
by: Shen, Han
Published: (2025)

Are Expressive Models Truly Necessary for Offline RL?
by: Wang, Guan, et al.
Published: (2024)

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
by: Zhao, Anhao, et al.
Published: (2026)

Efficient Prompt Optimization Through the Lens of Best Arm Identification
by: Shi, Chengshuai, et al.
Published: (2024)

Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)

Harnessing the Power of Federated Learning in Federated Contextual Bandits
by: Shi, Chengshuai, et al.
Published: (2023)