:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shu, Yao, Wei, Chenxing, Lin, Hongbin, Qiu, Shuang, Xiong, Hui
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.02469
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Effect of Regularization in Policy Mirror Descent
by: Kleuker, Jan Felix, et al.
Published: (2025)

One-Step Flow Policy Mirror Descent
by: Chen, Tianyi, et al.
Published: (2025)

Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
by: Shan, Zhao, et al.
Published: (2024)

Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints
by: Gao, Ting, et al.
Published: (2026)

Policy Mirror Descent with Lookahead
by: Protopapas, Kimon, et al.
Published: (2024)

On the Convergence of Policy in Unregularized Policy Mirror Descent
by: Lin, Dachao, et al.
Published: (2022)

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
by: Xu, Zhenghao, et al.
Published: (2023)

Functional Acceleration for Policy Mirror Descent
by: Chelu, Veronica, et al.
Published: (2024)

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
by: Qu, Yun, et al.
Published: (2026)

A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis
by: Qiu, Junwen, et al.
Published: (2026)

Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
by: Wang, Zeyuan, et al.
Published: (2026)

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors
by: Yuan, Chaohao, et al.
Published: (2026)

Entropy-Guided Multiplicative Updates: KL Projections for Multi-Factor Target Exposures
by: Qiu, Yimeng
Published: (2025)

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
by: Xu, Zhenghao, et al.
Published: (2026)

KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity
by: Aminian, Gholamali, et al.
Published: (2025)

Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
by: Raj, Gabriel Nixon
Published: (2025)

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
by: Xu, Hang, et al.
Published: (2024)

On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
by: Liu, Jiacai, et al.
Published: (2025)

Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent
by: Halder, Budhaditya, et al.
Published: (2026)

Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data
by: Li, Wenye, et al.
Published: (2025)

Fast Rates in $α$-Potential Games via Regularized Mirror Descent
by: Chen, Claire, et al.
Published: (2026)

A Unified Approach to Controlling Implicit Regularization via Mirror Descent
by: Sun, Haoyuan, et al.
Published: (2023)

Score-Regularized Joint Sampling with Importance Weights for Flow Matching
by: Liu, Xinshuang, et al.
Published: (2025)

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
by: Sherman, Uri, et al.
Published: (2025)

StaQ it! Growing neural networks for Policy Mirror Descent
by: Shilova, Alena, et al.
Published: (2025)

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
by: Li, Xiang, et al.
Published: (2026)

Mirror Descent Methods with Weighting Scheme for Outputs for Constrained Variational Inequality Problems
by: Alkousa, Mohammad S., et al.
Published: (2025)

Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
by: Bossens, David M., et al.
Published: (2025)

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling
by: Dern, Niclas, et al.
Published: (2025)

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)

Parameter-free Mirror Descent
by: Jacobsen, Andrew, et al.
Published: (2022)

Mirror Descent on Riemannian Manifolds
by: Jiang, Jiaxin, et al.
Published: (2026)

A Mirror Descent Perspective of Smoothed Sign Descent
by: Wang, Shuyang, et al.
Published: (2024)

HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
by: Liu, Zhanyu, et al.
Published: (2026)

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning
by: Qiu, Junwen, et al.
Published: (2026)

Target Mirror Descent: A Unifying Framework for Solving Monotone Variational Inequalities
by: Chen, Yu-Wen, et al.
Published: (2026)

Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
by: Alfano, Carlo, et al.
Published: (2023)

Finite-Particle Rates for Regularized Stein Variational Gradient Descent
by: He, Ye, et al.
Published: (2026)

Leave No One Undermined: Policy Targeting with Regret Aversion
by: Kitagawa, Toru, et al.
Published: (2025)