Saved in:
| Main Author: | Tao, Zhikun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08179 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024)
by: Zhai, Yuanzhao, et al.
Published: (2024)
Conservative DDPG -- Pessimistic RL without Ensemble
by: Soffair, Nitsan, et al.
Published: (2024)
by: Soffair, Nitsan, et al.
Published: (2024)
Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL
by: Choi, Jinwoo, et al.
Published: (2026)
by: Choi, Jinwoo, et al.
Published: (2026)
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data
by: Wang, Danyang, et al.
Published: (2024)
by: Wang, Danyang, et al.
Published: (2024)
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
by: Di, Qiwei, et al.
Published: (2023)
by: Di, Qiwei, et al.
Published: (2023)
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning
by: Bai, Chenjia, et al.
Published: (2024)
by: Bai, Chenjia, et al.
Published: (2024)
Pessimistic Risk-Aware Policy Learning in Contextual Bandits
by: Wan, Yilong, et al.
Published: (2026)
by: Wan, Yilong, et al.
Published: (2026)
Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning
by: Yu, Xudong, et al.
Published: (2024)
by: Yu, Xudong, et al.
Published: (2024)
Don't Trade Off Safety: Diffusion Regularization for Constrained Offline RL
by: Guo, Junyu, et al.
Published: (2025)
by: Guo, Junyu, et al.
Published: (2025)
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL
by: Wu, Ian, et al.
Published: (2026)
by: Wu, Ian, et al.
Published: (2026)
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
by: Li, Yunfei, et al.
Published: (2025)
by: Li, Yunfei, et al.
Published: (2025)
FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning
by: Koirala, Prajwal, et al.
Published: (2024)
by: Koirala, Prajwal, et al.
Published: (2024)
Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism
by: Ni, Tianwei, et al.
Published: (2025)
by: Ni, Tianwei, et al.
Published: (2025)
Pessimistic Backward Policy for GFlowNets
by: Jang, Hyosoon, et al.
Published: (2024)
by: Jang, Hyosoon, et al.
Published: (2024)
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
by: Zhao, Hanye, et al.
Published: (2024)
by: Zhao, Hanye, et al.
Published: (2024)
Improving Offline RL by Blending Heuristics
by: Geng, Sinong, et al.
Published: (2023)
by: Geng, Sinong, et al.
Published: (2023)
Budgeting Counterfactual for Offline RL
by: Liu, Yao, et al.
Published: (2023)
by: Liu, Yao, et al.
Published: (2023)
The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
by: Li, Yingru, et al.
Published: (2026)
by: Li, Yingru, et al.
Published: (2026)
Planning Transformer: Long-Horizon Offline Reinforcement Learning with Planning Tokens
by: Clinton, Joseph, et al.
Published: (2024)
by: Clinton, Joseph, et al.
Published: (2024)
Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)
by: Xu, Yinglun, et al.
Published: (2025)
Hyperparameter Tuning Through Pessimistic Bilevel Optimization
by: Ustun, Meltem Apaydin, et al.
Published: (2024)
by: Ustun, Meltem Apaydin, et al.
Published: (2024)
Adaptive Coarse-to-Fine Subgoal Refinement for Long-Horizon Offline Goal-Conditioned Reinforcement Learning
by: Ke, Kaiqiang, et al.
Published: (2026)
by: Ke, Kaiqiang, et al.
Published: (2026)
Toward Explainable Offline RL: Analyzing Representations in Intrinsically Motivated Decision Transformers
by: Guiducci, Leonardo, et al.
Published: (2025)
by: Guiducci, Leonardo, et al.
Published: (2025)
Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
Decoupled Prioritized Resampling for Offline RL
by: Yue, Yang, et al.
Published: (2023)
by: Yue, Yang, et al.
Published: (2023)
Augmenting Offline RL with Unlabeled Data
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
Skill Reuse as Compression in Agentic RL
by: Xu, Zhikun, et al.
Published: (2026)
by: Xu, Zhikun, et al.
Published: (2026)
When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL
by: Töpperwien, Jan Malte, et al.
Published: (2026)
by: Töpperwien, Jan Malte, et al.
Published: (2026)
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
by: Duan, Xintong, et al.
Published: (2025)
by: Duan, Xintong, et al.
Published: (2025)
Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning
by: Wang, Qi, et al.
Published: (2023)
by: Wang, Qi, et al.
Published: (2023)
Physics-informed RL for Maximal Safety Probability Estimation
by: Hoshino, Hikaru, et al.
Published: (2024)
by: Hoshino, Hikaru, et al.
Published: (2024)
A Case for Validation Buffer in Pessimistic Actor-Critic
by: Nauman, Michal, et al.
Published: (2024)
by: Nauman, Michal, et al.
Published: (2024)
Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
by: Hu, Jifeng, et al.
Published: (2024)
by: Hu, Jifeng, et al.
Published: (2024)
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
by: Su, Jianhai, et al.
Published: (2025)
by: Su, Jianhai, et al.
Published: (2025)
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
by: Cherepanov, Egor, et al.
Published: (2025)
by: Cherepanov, Egor, et al.
Published: (2025)
Horizon Reduction as Information Loss in Offline Reinforcement Learning
by: Nidadala, Uday Kumar, et al.
Published: (2025)
by: Nidadala, Uday Kumar, et al.
Published: (2025)
Offline Imitation Learning by Controlling the Effective Planning Horizon
by: Ahn, Hee-Jun, et al.
Published: (2024)
by: Ahn, Hee-Jun, et al.
Published: (2024)
Pessimistic Iterative Planning with RNNs for Robust POMDPs
by: Galesloot, Maris F. L., et al.
Published: (2024)
by: Galesloot, Maris F. L., et al.
Published: (2024)
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
by: Gupta, Aaryan, et al.
Published: (2025)
by: Gupta, Aaryan, et al.
Published: (2025)
Offline RL via Feature-Occupancy Gradient Ascent
by: Neu, Gergely, et al.
Published: (2024)
by: Neu, Gergely, et al.
Published: (2024)
Similar Items
-
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024) -
Conservative DDPG -- Pessimistic RL without Ensemble
by: Soffair, Nitsan, et al.
Published: (2024) -
Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL
by: Choi, Jinwoo, et al.
Published: (2026) -
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data
by: Wang, Danyang, et al.
Published: (2024) -
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
by: Di, Qiwei, et al.
Published: (2023)