Saved in:
| Main Authors: | Bharthulwar, Sid, Tao, Stone, Su, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.21011 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
by: Tao, Stone, et al.
Published: (2024)
by: Tao, Stone, et al.
Published: (2024)
Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
by: Zhou, Jello, et al.
Published: (2026)
by: Zhou, Jello, et al.
Published: (2026)
The Power of Resets in Online Reinforcement Learning
by: Mhammedi, Zakaria, et al.
Published: (2024)
by: Mhammedi, Zakaria, et al.
Published: (2024)
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers
by: Li, Chunxiao, et al.
Published: (2024)
by: Li, Chunxiao, et al.
Published: (2024)
Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
by: Escoriza, Adrià López, et al.
Published: (2025)
by: Escoriza, Adrià López, et al.
Published: (2025)
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
by: Yuan, Xiu, et al.
Published: (2024)
by: Yuan, Xiu, et al.
Published: (2024)
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models
by: Bharthulwar, Sid, et al.
Published: (2025)
by: Bharthulwar, Sid, et al.
Published: (2025)
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
by: Riemer, Matthew, et al.
Published: (2024)
by: Riemer, Matthew, et al.
Published: (2024)
StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel
by: Cutler, Dylan, et al.
Published: (2025)
by: Cutler, Dylan, et al.
Published: (2025)
On the Reuse Bias in Off-Policy Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2022)
by: Ying, Chengyang, et al.
Published: (2022)
Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)
by: Chang, Jonathan D., et al.
Published: (2024)
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
by: Shukla, Arth, et al.
Published: (2024)
by: Shukla, Arth, et al.
Published: (2024)
Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data
by: Tang, Yunhao, et al.
Published: (2025)
by: Tang, Yunhao, et al.
Published: (2025)
Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning
by: Ahn, Hongjoon, et al.
Published: (2024)
by: Ahn, Hongjoon, et al.
Published: (2024)
The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
by: Mayor, Walter, et al.
Published: (2025)
by: Mayor, Walter, et al.
Published: (2025)
Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
by: Markgraf, Hannah, et al.
Published: (2025)
by: Markgraf, Hannah, et al.
Published: (2025)
Exchangeable Gaussian Processes for Staggered-Adoption Policy Evaluation
by: Gevorgyan, Hayk, et al.
Published: (2026)
by: Gevorgyan, Hayk, et al.
Published: (2026)
Massively Parallel Expectation Maximization For Approximate Posteriors
by: Heap, Thomas, et al.
Published: (2025)
by: Heap, Thomas, et al.
Published: (2025)
Massively Parallel Exact Inference for Hawkes Processes
by: Raza, Ahmer, et al.
Published: (2026)
by: Raza, Ahmer, et al.
Published: (2026)
Learning Massively Multitask World Models for Continuous Control
by: Hansen, Nicklas, et al.
Published: (2025)
by: Hansen, Nicklas, et al.
Published: (2025)
Toward Information Theoretic Active Inverse Reinforcement Learning
by: Bajgar, Ondrej, et al.
Published: (2024)
by: Bajgar, Ondrej, et al.
Published: (2024)
Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action
by: Gao, Gong, et al.
Published: (2026)
by: Gao, Gong, et al.
Published: (2026)
POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning
by: Yu, Kuai, et al.
Published: (2025)
by: Yu, Kuai, et al.
Published: (2025)
Self-Normalized Resets for Plasticity in Continual Learning
by: Farias, Vivek F., et al.
Published: (2024)
by: Farias, Vivek F., et al.
Published: (2024)
Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments
by: Qian, Yiyu, et al.
Published: (2025)
by: Qian, Yiyu, et al.
Published: (2025)
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
by: Weltevrede, Max, et al.
Published: (2025)
by: Weltevrede, Max, et al.
Published: (2025)
Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations
by: Mitsides, Konstantinos, et al.
Published: (2025)
by: Mitsides, Konstantinos, et al.
Published: (2025)
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
by: Hu, Jiaheng, et al.
Published: (2024)
by: Hu, Jiaheng, et al.
Published: (2024)
Massively Scalable Inverse Reinforcement Learning in Google Maps
by: Barnes, Matt, et al.
Published: (2023)
by: Barnes, Matt, et al.
Published: (2023)
To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning
by: Ma, Tao, et al.
Published: (2024)
by: Ma, Tao, et al.
Published: (2024)
Dyna-LfLH: Learning Agile Navigation in Dynamic Environments from Learned Hallucination
by: Ghani, Saad Abdul, et al.
Published: (2024)
by: Ghani, Saad Abdul, et al.
Published: (2024)
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
by: Frati, Lapo, et al.
Published: (2023)
by: Frati, Lapo, et al.
Published: (2023)
Flow-Based Policy for Online Reinforcement Learning
by: Lv, Lei, et al.
Published: (2025)
by: Lv, Lei, et al.
Published: (2025)
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)
by: Hu, Hao, et al.
Published: (2025)
Deep Reinforcement Learning in Parameterized Action Space
by: Hausknecht, Matthew, et al.
Published: (2015)
by: Hausknecht, Matthew, et al.
Published: (2015)
Learning Without Time-Based Embodiment Resets in Soft-Actor Critic
by: Farrahi, Homayoon, et al.
Published: (2025)
by: Farrahi, Homayoon, et al.
Published: (2025)
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025)
by: Kong, Yilun, et al.
Published: (2025)
Digital Twin-Enhanced Wireless Indoor Navigation: Achieving Efficient Environment Sensing with Zero-Shot Reinforcement Learning
by: Li, Tao, et al.
Published: (2023)
by: Li, Tao, et al.
Published: (2023)
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
by: Zhang, Tonghe, et al.
Published: (2025)
by: Zhang, Tonghe, et al.
Published: (2025)
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
by: Corrado, Nicholas E., et al.
Published: (2023)
by: Corrado, Nicholas E., et al.
Published: (2023)
Similar Items
-
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
by: Tao, Stone, et al.
Published: (2024) -
Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
by: Zhou, Jello, et al.
Published: (2026) -
The Power of Resets in Online Reinforcement Learning
by: Mhammedi, Zakaria, et al.
Published: (2024) -
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers
by: Li, Chunxiao, et al.
Published: (2024) -
Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
by: Escoriza, Adrià López, et al.
Published: (2025)