:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Zechen, Greenwald, Amy, Parr, Ronald
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2501.01774
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
by: Sandhu, Dillon, et al.
Published: (2026)

A Unifying View of Coverage in Linear Off-Policy Evaluation
by: Amortila, Philip, et al.
Published: (2026)

Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation
by: Lim, Han-Dong, et al.
Published: (2025)

Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation
by: Lee, Donghwan
Published: (2024)

An Optimal Tightness Bound for the Simulation Lemma
by: Lobel, Sam, et al.
Published: (2024)

Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments
by: Kumar, Nishant, et al.
Published: (2026)

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
by: Ye, Chenlu, et al.
Published: (2026)

Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning
by: Dereziński, Michał, et al.
Published: (2024)

Residual Off-Policy RL for Finetuning Behavior Cloning Policies
by: Ankile, Lars, et al.
Published: (2025)

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)

LLMs Can Learn to Reason Via Off-Policy RL
by: Ritter, Daniel, et al.
Published: (2026)

Policy Learning for Off-Dynamics RL with Deficient Support
by: Van, Linh Le Pham, et al.
Published: (2024)

Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation
by: Liu, Zhishuai, et al.
Published: (2024)

Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)

On the Nystrom Approximation for Preconditioning in Kernel Machines
by: Abedsoltan, Amirhesam, et al.
Published: (2023)

PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective
by: Lau, Tim Tsz-Kit, et al.
Published: (2025)

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)

Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation
by: Kitamura, Toshinori, et al.
Published: (2025)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation
by: Lin, Max Qiushi, et al.
Published: (2025)

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation
by: Cayci, Semih, et al.
Published: (2021)

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
by: Bolland, Adrien, et al.
Published: (2024)

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
by: Romeo, Carlo, et al.
Published: (2026)

RL as Regressor: A Reinforcement Learning Approach for Function Approximation
by: Huang, Yongchao
Published: (2025)

Unifying On- and Off-Policy Variance Reduction Methods
by: Jeunen, Olivier
Published: (2026)

Bi-Level Policy Optimization with Nyström Hypergradients
by: Prakash, Arjun, et al.
Published: (2025)

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction
by: Guan, Zhong, et al.
Published: (2026)

Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting
by: McKenzie, Daniel, et al.
Published: (2023)

SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)

Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
by: Wang, Jiacheng, et al.
Published: (2026)

Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate
by: Luo, Fan-Ming, et al.
Published: (2024)

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
by: Gu, Jingwen, et al.
Published: (2025)

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
by: Luo, Yu, et al.
Published: (2024)

Statistical Inference for Temporal Difference Learning with Linear Function Approximation
by: Wu, Weichen, et al.
Published: (2024)

DyKAF: Dynamical Kronecker Approximation of the Fisher Information Matrix for Gradient Preconditioning
by: Yudin, Nikolay, et al.
Published: (2025)

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
by: Deng, Shenyang, et al.
Published: (2026)

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
by: Noukhovitch, Michael, et al.
Published: (2024)

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
by: Kwon, Jeongyeol, et al.
Published: (2024)

Matrix Low-Rank Approximation For Policy Gradient Methods
by: Rozada, Sergio, et al.
Published: (2024)