:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Lizhang, Li, Jonathan, Liang, Kaizhao, Su, Baiyu, Xie, Cong, Pierse, Nuo Wang, Liang, Chen, Lao, Ni, Liu, Qiang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2510.12402
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023)

$ϕ$-Balancing for Mixture-of-Experts Training
by: Chen, Lizhang, et al.
Published: (2026)

Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)

Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)

Cautious Optimizers: Improving Training with One Line of Code
by: Liang, Kaizhao, et al.
Published: (2024)

CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization
by: Huang, Feihu, et al.
Published: (2026)

Does Weight Decay Enhance Training Stability?
by: Saether, Marius, et al.
Published: (2026)

Training-Free Looped Transformers
by: Chen, Lizhang, et al.
Published: (2026)

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
by: Jacot, Arthur, et al.
Published: (2024)

Memory-Efficient LLM Training with Online Subspace Descent
by: Liang, Kaizhao, et al.
Published: (2024)

Propagation of Chaos in Contextual Flow Maps
by: Chen, Shi, et al.
Published: (2026)

The Local Landscape of Phase Retrieval Under Limited Samples
by: Liu, Kaizhao, et al.
Published: (2023)

A Weighted Gradient Tracking Privacy-Preserving Method for Distributed Optimization
by: Xie, Furan, et al.
Published: (2025)

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis
by: Li, Hao, et al.
Published: (2024)

Proximal Oracles for Optimization and Sampling
by: Liang, Jiaming, et al.
Published: (2024)

Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay
by: Laus, Hannah, et al.
Published: (2025)

Faster Rates for No-Regret Learning in General Games via Cautious Optimism
by: Soleymani, Ashkan, et al.
Published: (2025)

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
by: Wei, Yudong, et al.
Published: (2025)

Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
by: Soleymani, Ashkan, et al.
Published: (2025)

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
by: Zhang, Haihan, et al.
Published: (2024)

Accelerating Single-Pass SGD for Generalized Linear Prediction
by: Chen, Qian, et al.
Published: (2026)

Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)

Gradient descent in matrix factorization: Understanding large initialization
by: Chen, Hengchao, et al.
Published: (2023)

Variance Reduction and Low Sample Complexity in Stochastic Optimization via Proximal Point Method
by: Liang, Jiaming
Published: (2024)

A Unified Analysis for Finite Weight Averaging
by: Wang, Peng, et al.
Published: (2024)

Diffusion Model for Data-Driven Black-Box Optimization
by: Li, Zihao, et al.
Published: (2024)

Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight
by: Hwang, Wen-Liang
Published: (2024)

Offline Policy Learning with Weight Clipping and Heaviside Composite Optimization
by: Liu, Jingren, et al.
Published: (2026)

Trade-off in Estimating the Number of Byzantine Clients in Federated Learning
by: Chen, Ziyi, et al.
Published: (2025)

Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
by: Armacki, Aleksandar, et al.
Published: (2026)

On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
by: Liang, Ling, et al.
Published: (2024)

A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport
by: Liang, Ling, et al.
Published: (2026)

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
by: Liang, Yifei, et al.
Published: (2026)

A Hessian-Aware Stochastic Differential Equation for Modelling SGD
by: Li, Xiang, et al.
Published: (2024)

Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning
by: Heng, Qiang, et al.
Published: (2025)

An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport
by: Chen, Xiang, et al.
Published: (2024)

From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes
by: Huo, Dongyan, et al.
Published: (2022)

Langevin Multiplicative Weights Update with Applications in Polynomial Portfolio Management
by: Feng, Yi, et al.
Published: (2025)

Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
by: Harada, Hinata, et al.
Published: (2024)