Saved in:
| Main Authors: | Geng, Sinong, Pacchiano, Aldo, Kolobov, Andrey, Cheng, Ching-An |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.00321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improved Training Mechanism for Reinforcement Learning via Online Model Selection
by: Afshar, Aida, et al.
Published: (2025)
by: Afshar, Aida, et al.
Published: (2025)
Second Order Bounds for Contextual Bandits with Function Approximation
by: Pacchiano, Aldo
Published: (2024)
by: Pacchiano, Aldo
Published: (2024)
Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
by: Hu, Xinyi, et al.
Published: (2025)
by: Hu, Xinyi, et al.
Published: (2025)
Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
by: Russo, Alessio, et al.
Published: (2025)
by: Russo, Alessio, et al.
Published: (2025)
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
by: Afshar, Aida, et al.
Published: (2024)
by: Afshar, Aida, et al.
Published: (2024)
In-Context Learning for Pure Exploration in Continuous Spaces
by: Russo, Alessio, et al.
Published: (2026)
by: Russo, Alessio, et al.
Published: (2026)
Bayesian Online Model Selection
by: Afshar, Aida, et al.
Published: (2026)
by: Afshar, Aida, et al.
Published: (2026)
Pure Exploration with Feedback Graphs
by: Russo, Alessio, et al.
Published: (2025)
by: Russo, Alessio, et al.
Published: (2025)
Contextual Bandits with Stage-wise Constraints
by: Pacchiano, Aldo, et al.
Published: (2024)
by: Pacchiano, Aldo, et al.
Published: (2024)
Data-Driven Online Model Selection With Regret Guarantees
by: Pacchiano, Aldo, et al.
Published: (2023)
by: Pacchiano, Aldo, et al.
Published: (2023)
State-free Reinforcement Learning
by: Chen, Mingyu, et al.
Published: (2024)
by: Chen, Mingyu, et al.
Published: (2024)
In-Context Learning for Pure Exploration
by: Russo, Alessio, et al.
Published: (2025)
by: Russo, Alessio, et al.
Published: (2025)
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
by: Zheng, Ruijie, et al.
Published: (2024)
by: Zheng, Ruijie, et al.
Published: (2024)
Multiple-policy Evaluation via Density Estimation
by: Chen, Yilei, et al.
Published: (2024)
by: Chen, Yilei, et al.
Published: (2024)
Experiment Planning with Function Approximation
by: Pacchiano, Aldo, et al.
Published: (2024)
by: Pacchiano, Aldo, et al.
Published: (2024)
On the Hardness of Bandit Learning
by: Brukhim, Nataly, et al.
Published: (2025)
by: Brukhim, Nataly, et al.
Published: (2025)
Language Model Personalization via Reward Factorization
by: Shenfeld, Idan, et al.
Published: (2025)
by: Shenfeld, Idan, et al.
Published: (2025)
Provable Interactive Learning with Hindsight Instruction Feedback
by: Misra, Dipendra, et al.
Published: (2024)
by: Misra, Dipendra, et al.
Published: (2024)
The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
by: Baharav, Tavor Z., et al.
Published: (2025)
by: Baharav, Tavor Z., et al.
Published: (2025)
A Theoretical Framework for Partially Observed Reward-States in RLHF
by: Kausik, Chinmaya, et al.
Published: (2024)
by: Kausik, Chinmaya, et al.
Published: (2024)
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
by: Lin, Xiaofeng, et al.
Published: (2026)
by: Lin, Xiaofeng, et al.
Published: (2026)
Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)
by: Das, Nirjhar, et al.
Published: (2024)
Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward
by: Misra, Dipendra, et al.
Published: (2026)
by: Misra, Dipendra, et al.
Published: (2026)
Streetwise Agents: Empowering Offline RL Policies to Outsmart Exogenous Stochastic Disturbances in RTC
by: Soni, Aditya, et al.
Published: (2024)
by: Soni, Aditya, et al.
Published: (2024)
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)
by: Luo, Yu, et al.
Published: (2024)
When Less is Enough: Efficient Inference via Collaborative Reasoning
by: Chen, Yilei, et al.
Published: (2026)
by: Chen, Yilei, et al.
Published: (2026)
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
by: Zhang, Chen Bo Calvin, et al.
Published: (2024)
by: Zhang, Chen Bo Calvin, et al.
Published: (2024)
Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)
by: Liu, Yao, et al.
Published: (2023)
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
by: Landers, Matthew, et al.
Published: (2026)
by: Landers, Matthew, et al.
Published: (2026)
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
by: Zhan, Simon Sinong, et al.
Published: (2025)
by: Zhan, Simon Sinong, et al.
Published: (2025)
Latent Policy Steering through One-Step Flow Policies
by: Im, Hokyun, et al.
Published: (2026)
by: Im, Hokyun, et al.
Published: (2026)
Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
Decoupled Prioritized Resampling for Offline RL
by: Yue, Yang, et al.
Published: (2023)
by: Yue, Yang, et al.
Published: (2023)
Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
by: Töpperwien, Jan Malte, et al.
Published: (2026)
by: Töpperwien, Jan Malte, et al.
Published: (2026)
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)
by: Wang, Qi, et al.
Published: (2023)
How to Solve Contextual Goal-Oriented Problems with Offline Datasets?
by: Fan, Ying, et al.
Published: (2024)
by: Fan, Ying, et al.
Published: (2024)
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)
by: Su, Jianhai, et al.
Published: (2025)
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)
by: Gupta, Aaryan, et al.
Published: (2025)
Offline RL via Feature-Occupancy Gradient Ascent
by: Neu, Gergely, et al.
Published: (2024)
by: Neu, Gergely, et al.
Published: (2024)
Similar Items
-
Improved Training Mechanism for Reinforcement Learning via Online Model Selection
by: Afshar, Aida, et al.
Published: (2025) -
Second Order Bounds for Contextual Bandits with Function Approximation
by: Pacchiano, Aldo
Published: (2024) -
Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
by: Hu, Xinyi, et al.
Published: (2025) -
Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
by: Russo, Alessio, et al.
Published: (2025) -
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives
by: Afshar, Aida, et al.
Published: (2024)