Saved in:
| Main Authors: | Gong, Yuehu, Wang, Zeyuan, Chen, Yulin, Ding, Shutong, Zhou, Qingyuan, Fu, Yanwei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.21621 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
by: Wang, Zeyuan, et al.
Published: (2026)
by: Wang, Zeyuan, et al.
Published: (2026)
Distributional Reinforcement Learning with Diffusion Bridge Critics
by: Ding, Shutong, et al.
Published: (2026)
by: Ding, Shutong, et al.
Published: (2026)
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow
by: Wang, Zeyuan, et al.
Published: (2025)
by: Wang, Zeyuan, et al.
Published: (2025)
Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge
by: Han, Dong-Sig, et al.
Published: (2025)
by: Han, Dong-Sig, et al.
Published: (2025)
One-Step Flow Policy Mirror Descent
by: Chen, Tianyi, et al.
Published: (2025)
by: Chen, Tianyi, et al.
Published: (2025)
Value Mirror Descent for Reinforcement Learning
by: Jia, Zhichao, et al.
Published: (2026)
by: Jia, Zhichao, et al.
Published: (2026)
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
by: Ding, Shutong, et al.
Published: (2025)
by: Ding, Shutong, et al.
Published: (2025)
Policy Mirror Descent with Lookahead
by: Protopapas, Kimon, et al.
Published: (2024)
by: Protopapas, Kimon, et al.
Published: (2024)
On the Effect of Regularization in Policy Mirror Descent
by: Kleuker, Jan Felix, et al.
Published: (2025)
by: Kleuker, Jan Felix, et al.
Published: (2025)
Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data
by: Li, Wenye, et al.
Published: (2025)
by: Li, Wenye, et al.
Published: (2025)
On the Convergence of Policy in Unregularized Policy Mirror Descent
by: Lin, Dachao, et al.
Published: (2022)
by: Lin, Dachao, et al.
Published: (2022)
Functional Acceleration for Policy Mirror Descent
by: Chelu, Veronica, et al.
Published: (2024)
by: Chelu, Veronica, et al.
Published: (2024)
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
by: Zhong, Shan, et al.
Published: (2025)
by: Zhong, Shan, et al.
Published: (2025)
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
by: Xu, Zhenghao, et al.
Published: (2023)
by: Xu, Zhenghao, et al.
Published: (2023)
Mirror Descent on Reproducing Kernel Banach Spaces
by: Kumar, Akash, et al.
Published: (2024)
by: Kumar, Akash, et al.
Published: (2024)
Mirror and Preconditioned Gradient Descent in Wasserstein Space
by: Bonet, Clément, et al.
Published: (2024)
by: Bonet, Clément, et al.
Published: (2024)
Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization
by: Chen, Sijia, et al.
Published: (2023)
by: Chen, Sijia, et al.
Published: (2023)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
by: Ding, Shutong, et al.
Published: (2024)
by: Ding, Shutong, et al.
Published: (2024)
Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints
by: Gao, Ting, et al.
Published: (2026)
by: Gao, Ting, et al.
Published: (2026)
On the Convergence of Policy Mirror Descent with Temporal Difference Evaluation
by: Liu, Jiacai, et al.
Published: (2025)
by: Liu, Jiacai, et al.
Published: (2025)
A Mirror Descent Perspective of Smoothed Sign Descent
by: Wang, Shuyang, et al.
Published: (2024)
by: Wang, Shuyang, et al.
Published: (2024)
Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
by: Sherman, Uri, et al.
Published: (2025)
by: Sherman, Uri, et al.
Published: (2025)
StaQ it! Growing neural networks for Policy Mirror Descent
by: Shilova, Alena, et al.
Published: (2025)
by: Shilova, Alena, et al.
Published: (2025)
Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
by: Li, Xiang, et al.
Published: (2026)
by: Li, Xiang, et al.
Published: (2026)
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
by: Alfano, Carlo, et al.
Published: (2023)
by: Alfano, Carlo, et al.
Published: (2023)
Never Saddle for Reparameterized Steepest Descent as Mirror Flow
by: Jacobs, Tom, et al.
Published: (2026)
by: Jacobs, Tom, et al.
Published: (2026)
Estimating Individual Dose-Response Curves under Unobserved Confounders from Observational Data
by: Chen, Shutong, et al.
Published: (2024)
by: Chen, Shutong, et al.
Published: (2024)
Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent
by: Lee, Joongkyu, et al.
Published: (2026)
by: Lee, Joongkyu, et al.
Published: (2026)
Parameter-free Mirror Descent
by: Jacobsen, Andrew, et al.
Published: (2022)
by: Jacobsen, Andrew, et al.
Published: (2022)
Mirror Descent on Riemannian Manifolds
by: Jiang, Jiaxin, et al.
Published: (2026)
by: Jiang, Jiaxin, et al.
Published: (2026)
Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
by: Raj, Gabriel Nixon
Published: (2025)
by: Raj, Gabriel Nixon
Published: (2025)
Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
by: Bossens, David M., et al.
Published: (2025)
by: Bossens, David M., et al.
Published: (2025)
Instance Generation for Meta-Black-Box Optimization through Latent Space Reverse Engineering
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Learning Mixtures of Experts with EM: A Mirror Descent Perspective
by: Fruytier, Quentin, et al.
Published: (2024)
by: Fruytier, Quentin, et al.
Published: (2024)
Mirror Descent Actor Critic via Bounded Advantage Learning
by: Iwaki, Ryo
Published: (2025)
by: Iwaki, Ryo
Published: (2025)
Adaptively Perturbed Mirror Descent for Learning in Games
by: Abe, Kenshi, et al.
Published: (2023)
by: Abe, Kenshi, et al.
Published: (2023)
Extreme Value Policy Optimization for Safe Reinforcement Learning
by: Gao, Shiqing, et al.
Published: (2026)
by: Gao, Shiqing, et al.
Published: (2026)
Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance
by: Ding, Shutong, et al.
Published: (2026)
by: Ding, Shutong, et al.
Published: (2026)
The Hidden Cost of Approximation in Online Mirror Descent
by: Schlisselberg, Ofir, et al.
Published: (2025)
by: Schlisselberg, Ofir, et al.
Published: (2025)
Similar Items
-
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
by: Wang, Zeyuan, et al.
Published: (2026) -
Distributional Reinforcement Learning with Diffusion Bridge Critics
by: Ding, Shutong, et al.
Published: (2026) -
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow
by: Wang, Zeyuan, et al.
Published: (2025) -
Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge
by: Han, Dong-Sig, et al.
Published: (2025) -
One-Step Flow Policy Mirror Descent
by: Chen, Tianyi, et al.
Published: (2025)