:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bharthulwar, Sid, Tao, Stone, Su, Hao
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.21011
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
by: Tao, Stone, et al.
Published: (2024)

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning
by: Zhou, Jello, et al.
Published: (2026)

The Power of Resets in Online Reinforcement Learning
by: Mhammedi, Zakaria, et al.
Published: (2024)

A Reinforcement Learning based Reset Policy for CDCL SAT Solvers
by: Li, Chunxiao, et al.
Published: (2024)

Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning
by: Escoriza, Adrià López, et al.
Published: (2025)

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
by: Yuan, Xiu, et al.
Published: (2024)

Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models
by: Bharthulwar, Sid, et al.
Published: (2025)

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
by: Riemer, Matthew, et al.
Published: (2024)

StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel
by: Cutler, Dylan, et al.
Published: (2025)

On the Reuse Bias in Off-Policy Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2022)

Dataset Reset Policy Optimization for RLHF
by: Chang, Jonathan D., et al.
Published: (2024)

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
by: Shukla, Arth, et al.
Published: (2024)

Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data
by: Tang, Yunhao, et al.
Published: (2025)

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning
by: Ahn, Hongjoon, et al.
Published: (2024)

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
by: Mayor, Walter, et al.
Published: (2025)

Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
by: Markgraf, Hannah, et al.
Published: (2025)

Exchangeable Gaussian Processes for Staggered-Adoption Policy Evaluation
by: Gevorgyan, Hayk, et al.
Published: (2026)

Massively Parallel Expectation Maximization For Approximate Posteriors
by: Heap, Thomas, et al.
Published: (2025)

Massively Parallel Exact Inference for Hawkes Processes
by: Raza, Ahmer, et al.
Published: (2026)

Learning Massively Multitask World Models for Continuous Control
by: Hansen, Nicklas, et al.
Published: (2025)

Toward Information Theoretic Active Inverse Reinforcement Learning
by: Bajgar, Ondrej, et al.
Published: (2024)

Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action
by: Gao, Gong, et al.
Published: (2026)

POLAR: Policy-based Layerwise Reinforcement Learning Method for Stealthy Backdoor Attacks in Federated Learning
by: Yu, Kuai, et al.
Published: (2025)

Self-Normalized Resets for Plasticity in Continual Learning
by: Farias, Vivek F., et al.
Published: (2024)

Interpret Policies in Deep Reinforcement Learning using SILVER with RL-Guided Labeling: A Model-level Approach to High-dimensional and Multi-action Environments
by: Qian, Yiyu, et al.
Published: (2025)

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
by: Weltevrede, Max, et al.
Published: (2025)

Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations
by: Mitsides, Konstantinos, et al.
Published: (2025)

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
by: Hu, Jiaheng, et al.
Published: (2024)

Massively Scalable Inverse Reinforcement Learning in Google Maps
by: Barnes, Matt, et al.
Published: (2023)

To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning
by: Ma, Tao, et al.
Published: (2024)

Dyna-LfLH: Learning Agile Navigation in Dynamic Environments from Learned Hallucination
by: Ghani, Saad Abdul, et al.
Published: (2024)

Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
by: Frati, Lapo, et al.
Published: (2023)

Flow-Based Policy for Online Reinforcement Learning
by: Lv, Lei, et al.
Published: (2025)

Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)

Deep Reinforcement Learning in Parameterized Action Space
by: Hausknecht, Matthew, et al.
Published: (2015)

Learning Without Time-Based Embodiment Resets in Soft-Actor Critic
by: Farrahi, Homayoon, et al.
Published: (2025)

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025)

Digital Twin-Enhanced Wireless Indoor Navigation: Achieving Efficient Environment Sensing with Zero-Shot Reinforcement Learning
by: Li, Tao, et al.
Published: (2023)

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
by: Zhang, Tonghe, et al.
Published: (2025)

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
by: Corrado, Nicholas E., et al.
Published: (2023)