:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Geng, Sinong, Pacchiano, Aldo, Kolobov, Andrey, Cheng, Ching-An
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2306.00321
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improved Training Mechanism for Reinforcement Learning via Online Model Selection
by: Afshar, Aida, et al.
Published: (2025)

Second Order Bounds for Contextual Bandits with Function Approximation
by: Pacchiano, Aldo
Published: (2024)

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
by: Hu, Xinyi, et al.
Published: (2025)

Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
by: Russo, Alessio, et al.
Published: (2025)

Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
by: Afshar, Aida, et al.
Published: (2024)

In-Context Learning for Pure Exploration in Continuous Spaces
by: Russo, Alessio, et al.
Published: (2026)

Bayesian Online Model Selection
by: Afshar, Aida, et al.
Published: (2026)

Pure Exploration with Feedback Graphs
by: Russo, Alessio, et al.
Published: (2025)

Contextual Bandits with Stage-wise Constraints
by: Pacchiano, Aldo, et al.
Published: (2024)

Data-Driven Online Model Selection With Regret Guarantees
by: Pacchiano, Aldo, et al.
Published: (2023)

State-free Reinforcement Learning
by: Chen, Mingyu, et al.
Published: (2024)

In-Context Learning for Pure Exploration
by: Russo, Alessio, et al.
Published: (2025)

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
by: Zheng, Ruijie, et al.
Published: (2024)

Multiple-policy Evaluation via Density Estimation
by: Chen, Yilei, et al.
Published: (2024)

Experiment Planning with Function Approximation
by: Pacchiano, Aldo, et al.
Published: (2024)

On the Hardness of Bandit Learning
by: Brukhim, Nataly, et al.
Published: (2025)

Language Model Personalization via Reward Factorization
by: Shenfeld, Idan, et al.
Published: (2025)

Provable Interactive Learning with Hindsight Instruction Feedback
by: Misra, Dipendra, et al.
Published: (2024)

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
by: Baharav, Tavor Z., et al.
Published: (2025)

A Theoretical Framework for Partially Observed Reward-States in RLHF
by: Kausik, Chinmaya, et al.
Published: (2024)

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
by: Lin, Xiaofeng, et al.
Published: (2026)

Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
by: Misra, Dipendra, et al.
Published: (2026)

Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC
by: Soni, Aditya, et al.
Published: (2024)

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)

When Less is Enough: Efficient Inference via Collaborative Reasoning
by: Chen, Yilei, et al.
Published: (2026)

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
by: Zhang, Chen Bo Calvin, et al.
Published: (2024)

Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)

Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
by: Landers, Matthew, et al.
Published: (2026)

Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
by: Zhan, Simon Sinong, et al.
Published: (2025)

Latent Policy Steering through One-Step Flow Policies
by: Im, Hokyun, et al.
Published: (2026)

Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)

Decoupled Prioritized Resampling for Offline RL
by: Yue, Yang, et al.
Published: (2023)

Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024)

When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
by: Töpperwien, Jan Malte, et al.
Published: (2026)

Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
by: Fan, Ying, et al.
Published: (2024)

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)

Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)

Offline RL via Feature-Occupancy Gradient Ascent
by: Neu, Gergely, et al.
Published: (2024)