Saved in:
| Main Authors: | Wu, Zechen, Greenwald, Amy, Parr, Ronald |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01774 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
by: Sandhu, Dillon, et al.
Published: (2026)
by: Sandhu, Dillon, et al.
Published: (2026)
A Unifying View of Coverage in Linear Off-Policy Evaluation
by: Amortila, Philip, et al.
Published: (2026)
by: Amortila, Philip, et al.
Published: (2026)
Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation
by: Lim, Han-Dong, et al.
Published: (2025)
by: Lim, Han-Dong, et al.
Published: (2025)
Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
by: Lee, Donghwan
Published: (2024)
by: Lee, Donghwan
Published: (2024)
An Optimal Tightness Bound for the Simulation Lemma
by: Lobel, Sam, et al.
Published: (2024)
by: Lobel, Sam, et al.
Published: (2024)
Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments
by: Kumar, Nishant, et al.
Published: (2026)
by: Kumar, Nishant, et al.
Published: (2026)
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
by: Ye, Chenlu, et al.
Published: (2026)
by: Ye, Chenlu, et al.
Published: (2026)
Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning
by: Dereziński, Michał, et al.
Published: (2024)
by: Dereziński, Michał, et al.
Published: (2024)
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
by: Ankile, Lars, et al.
Published: (2025)
by: Ankile, Lars, et al.
Published: (2025)
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)
by: Zhao, Shiwan, et al.
Published: (2026)
LLMs Can Learn to Reason Via Off-Policy RL
by: Ritter, Daniel, et al.
Published: (2026)
by: Ritter, Daniel, et al.
Published: (2026)
Policy Learning for Off-Dynamics RL with Deficient Support
by: Van, Linh Le Pham, et al.
Published: (2024)
by: Van, Linh Le Pham, et al.
Published: (2024)
Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation
by: Liu, Zhishuai, et al.
Published: (2024)
by: Liu, Zhishuai, et al.
Published: (2024)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
On the Nystrom Approximation for Preconditioning in Kernel Machines
by: Abedsoltan, Amirhesam, et al.
Published: (2023)
by: Abedsoltan, Amirhesam, et al.
Published: (2023)
PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective
by: Lau, Tim Tsz-Kit, et al.
Published: (2025)
by: Lau, Tim Tsz-Kit, et al.
Published: (2025)
Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)
by: Fakoor, Rasool, et al.
Published: (2026)
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)
by: Lee, Haanvid, et al.
Published: (2024)
Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
by: Kitamura, Toshinori, et al.
Published: (2025)
by: Kitamura, Toshinori, et al.
Published: (2025)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation
by: Lin, Max Qiushi, et al.
Published: (2025)
by: Lin, Max Qiushi, et al.
Published: (2025)
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
by: Cayci, Semih, et al.
Published: (2021)
by: Cayci, Semih, et al.
Published: (2021)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
by: Bolland, Adrien, et al.
Published: (2024)
by: Bolland, Adrien, et al.
Published: (2024)
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
by: Romeo, Carlo, et al.
Published: (2026)
by: Romeo, Carlo, et al.
Published: (2026)
RL as Regressor: A Reinforcement Learning Approach for Function Approximation
by: Huang, Yongchao
Published: (2025)
by: Huang, Yongchao
Published: (2025)
Unifying On- and Off-Policy Variance Reduction Methods
by: Jeunen, Olivier
Published: (2026)
by: Jeunen, Olivier
Published: (2026)
Bi-Level Policy Optimization with Nyström Hypergradients
by: Prakash, Arjun, et al.
Published: (2025)
by: Prakash, Arjun, et al.
Published: (2025)
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
by: Guan, Zhong, et al.
Published: (2026)
by: Guan, Zhong, et al.
Published: (2026)
Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting
by: McKenzie, Daniel, et al.
Published: (2023)
by: McKenzie, Daniel, et al.
Published: (2023)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
by: Wang, Jiacheng, et al.
Published: (2026)
by: Wang, Jiacheng, et al.
Published: (2026)
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate
by: Luo, Fan-Ming, et al.
Published: (2024)
by: Luo, Fan-Ming, et al.
Published: (2024)
Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
by: Gu, Jingwen, et al.
Published: (2025)
by: Gu, Jingwen, et al.
Published: (2025)
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
by: Luo, Yu, et al.
Published: (2024)
by: Luo, Yu, et al.
Published: (2024)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation
by: Wu, Weichen, et al.
Published: (2024)
by: Wu, Weichen, et al.
Published: (2024)
DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
by: Yudin, Nikolay, et al.
Published: (2025)
by: Yudin, Nikolay, et al.
Published: (2025)
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
by: Deng, Shenyang, et al.
Published: (2026)
by: Deng, Shenyang, et al.
Published: (2026)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
by: Noukhovitch, Michael, et al.
Published: (2024)
by: Noukhovitch, Michael, et al.
Published: (2024)
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
by: Kwon, Jeongyeol, et al.
Published: (2024)
by: Kwon, Jeongyeol, et al.
Published: (2024)
Matrix Low-Rank Approximation For Policy Gradient Methods
by: Rozada, Sergio, et al.
Published: (2024)
by: Rozada, Sergio, et al.
Published: (2024)
Similar Items
-
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
by: Sandhu, Dillon, et al.
Published: (2026) -
A Unifying View of Coverage in Linear Off-Policy Evaluation
by: Amortila, Philip, et al.
Published: (2026) -
Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation
by: Lim, Han-Dong, et al.
Published: (2025) -
Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
by: Lee, Donghwan
Published: (2024) -
An Optimal Tightness Bound for the Simulation Lemma
by: Lobel, Sam, et al.
Published: (2024)