Saved in:
| Main Authors: | Shu, Yao, Wei, Chenxing, Lin, Hongbin, Qiu, Shuang, Xiong, Hui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02469 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Effect of Regularization in Policy Mirror Descent
by: Kleuker, Jan Felix, et al.
Published: (2025)
by: Kleuker, Jan Felix, et al.
Published: (2025)
One-Step Flow Policy Mirror Descent
by: Chen, Tianyi, et al.
Published: (2025)
by: Chen, Tianyi, et al.
Published: (2025)
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
by: Shan, Zhao, et al.
Published: (2024)
by: Shan, Zhao, et al.
Published: (2024)
Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints
by: Gao, Ting, et al.
Published: (2026)
by: Gao, Ting, et al.
Published: (2026)
Policy Mirror Descent with Lookahead
by: Protopapas, Kimon, et al.
Published: (2024)
by: Protopapas, Kimon, et al.
Published: (2024)
On the Convergence of Policy in Unregularized Policy Mirror Descent
by: Lin, Dachao, et al.
Published: (2022)
by: Lin, Dachao, et al.
Published: (2022)
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
by: Xu, Zhenghao, et al.
Published: (2023)
by: Xu, Zhenghao, et al.
Published: (2023)
Functional Acceleration for Policy Mirror Descent
by: Chelu, Veronica, et al.
Published: (2024)
by: Chelu, Veronica, et al.
Published: (2024)
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
by: Qu, Yun, et al.
Published: (2026)
by: Qu, Yun, et al.
Published: (2026)
A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis
by: Qiu, Junwen, et al.
Published: (2026)
by: Qiu, Junwen, et al.
Published: (2026)
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
by: Wang, Zeyuan, et al.
Published: (2026)
by: Wang, Zeyuan, et al.
Published: (2026)
Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors
by: Yuan, Chaohao, et al.
Published: (2026)
by: Yuan, Chaohao, et al.
Published: (2026)
Entropy-Guided Multiplicative Updates: KL Projections for Multi-Factor Target Exposures
by: Qiu, Yimeng
Published: (2025)
by: Qiu, Yimeng
Published: (2025)
Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
by: Xu, Zhenghao, et al.
Published: (2026)
by: Xu, Zhenghao, et al.
Published: (2026)
KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity
by: Aminian, Gholamali, et al.
Published: (2025)
by: Aminian, Gholamali, et al.
Published: (2025)
Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
by: Raj, Gabriel Nixon
Published: (2025)
by: Raj, Gabriel Nixon
Published: (2025)
Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent
by: Xu, Hang, et al.
Published: (2024)
by: Xu, Hang, et al.
Published: (2024)
On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
by: Liu, Jiacai, et al.
Published: (2025)
by: Liu, Jiacai, et al.
Published: (2025)
Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent
by: Halder, Budhaditya, et al.
Published: (2026)
by: Halder, Budhaditya, et al.
Published: (2026)
Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data
by: Li, Wenye, et al.
Published: (2025)
by: Li, Wenye, et al.
Published: (2025)
Fast Rates in $α$-Potential Games via Regularized Mirror Descent
by: Chen, Claire, et al.
Published: (2026)
by: Chen, Claire, et al.
Published: (2026)
A Unified Approach to Controlling Implicit Regularization via Mirror Descent
by: Sun, Haoyuan, et al.
Published: (2023)
by: Sun, Haoyuan, et al.
Published: (2023)
Score-Regularized Joint Sampling with Importance Weights for Flow Matching
by: Liu, Xinshuang, et al.
Published: (2025)
by: Liu, Xinshuang, et al.
Published: (2025)
Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
by: Sherman, Uri, et al.
Published: (2025)
by: Sherman, Uri, et al.
Published: (2025)
StaQ it! Growing neural networks for Policy Mirror Descent
by: Shilova, Alena, et al.
Published: (2025)
by: Shilova, Alena, et al.
Published: (2025)
Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
by: Li, Xiang, et al.
Published: (2026)
by: Li, Xiang, et al.
Published: (2026)
Mirror Descent Methods with Weighting Scheme for Outputs for Constrained Variational Inequality Problems
by: Alkousa, Mohammad S., et al.
Published: (2025)
by: Alkousa, Mohammad S., et al.
Published: (2025)
Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
by: Bossens, David M., et al.
Published: (2025)
by: Bossens, David M., et al.
Published: (2025)
Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling
by: Dern, Niclas, et al.
Published: (2025)
by: Dern, Niclas, et al.
Published: (2025)
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)
by: Bounhar, Abdelaziz, et al.
Published: (2025)
Parameter-free Mirror Descent
by: Jacobsen, Andrew, et al.
Published: (2022)
by: Jacobsen, Andrew, et al.
Published: (2022)
Mirror Descent on Riemannian Manifolds
by: Jiang, Jiaxin, et al.
Published: (2026)
by: Jiang, Jiaxin, et al.
Published: (2026)
A Mirror Descent Perspective of Smoothed Sign Descent
by: Wang, Shuyang, et al.
Published: (2024)
by: Wang, Shuyang, et al.
Published: (2024)
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
by: Liu, Zhanyu, et al.
Published: (2026)
by: Liu, Zhanyu, et al.
Published: (2026)
Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning
by: Qiu, Junwen, et al.
Published: (2026)
by: Qiu, Junwen, et al.
Published: (2026)
Target Mirror Descent: A Unifying Framework for Solving Monotone Variational Inequalities
by: Chen, Yu-Wen, et al.
Published: (2026)
by: Chen, Yu-Wen, et al.
Published: (2026)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
by: Alfano, Carlo, et al.
Published: (2023)
by: Alfano, Carlo, et al.
Published: (2023)
Finite-Particle Rates for Regularized Stein Variational Gradient Descent
by: He, Ye, et al.
Published: (2026)
by: He, Ye, et al.
Published: (2026)
Leave No One Undermined: Policy Targeting with Regret Aversion
by: Kitagawa, Toru, et al.
Published: (2025)
by: Kitagawa, Toru, et al.
Published: (2025)
Similar Items
-
On the Effect of Regularization in Policy Mirror Descent
by: Kleuker, Jan Felix, et al.
Published: (2025) -
One-Step Flow Policy Mirror Descent
by: Chen, Tianyi, et al.
Published: (2025) -
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
by: Shan, Zhao, et al.
Published: (2024) -
Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints
by: Gao, Ting, et al.
Published: (2026) -
Policy Mirror Descent with Lookahead
by: Protopapas, Kimon, et al.
Published: (2024)