Saved in:
| Main Authors: | Chen, Lizhang, Li, Jonathan, Liang, Kaizhao, Su, Baiyu, Xie, Cong, Pierse, Nuo Wang, Liang, Chen, Lao, Ni, Liu, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.12402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023)
by: Chen, Lizhang, et al.
Published: (2023)
$ϕ$-Balancing for Mixture-of-Experts Training
by: Chen, Lizhang, et al.
Published: (2026)
by: Chen, Lizhang, et al.
Published: (2026)
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)
by: Chen, Lizhang, et al.
Published: (2025)
Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024)
by: Liu, Bo, et al.
Published: (2024)
Cautious Optimizers: Improving Training with One Line of Code
by: Liang, Kaizhao, et al.
Published: (2024)
by: Liang, Kaizhao, et al.
Published: (2024)
CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization
by: Huang, Feihu, et al.
Published: (2026)
by: Huang, Feihu, et al.
Published: (2026)
Does Weight Decay Enhance Training Stability?
by: Saether, Marius, et al.
Published: (2026)
by: Saether, Marius, et al.
Published: (2026)
Training-Free Looped Transformers
by: Chen, Lizhang, et al.
Published: (2026)
by: Chen, Lizhang, et al.
Published: (2026)
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
by: Jacot, Arthur, et al.
Published: (2024)
by: Jacot, Arthur, et al.
Published: (2024)
Memory-Efficient LLM Training with Online Subspace Descent
by: Liang, Kaizhao, et al.
Published: (2024)
by: Liang, Kaizhao, et al.
Published: (2024)
Propagation of Chaos in Contextual Flow Maps
by: Chen, Shi, et al.
Published: (2026)
by: Chen, Shi, et al.
Published: (2026)
The Local Landscape of Phase Retrieval Under Limited Samples
by: Liu, Kaizhao, et al.
Published: (2023)
by: Liu, Kaizhao, et al.
Published: (2023)
A Weighted Gradient Tracking Privacy-Preserving Method for Distributed Optimization
by: Xie, Furan, et al.
Published: (2025)
by: Xie, Furan, et al.
Published: (2025)
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Proximal Oracles for Optimization and Sampling
by: Liang, Jiaming, et al.
Published: (2024)
by: Liang, Jiaming, et al.
Published: (2024)
Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay
by: Laus, Hannah, et al.
Published: (2025)
by: Laus, Hannah, et al.
Published: (2025)
Faster Rates for No-Regret Learning in General Games via Cautious Optimism
by: Soleymani, Ashkan, et al.
Published: (2025)
by: Soleymani, Ashkan, et al.
Published: (2025)
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
by: Wei, Yudong, et al.
Published: (2025)
by: Wei, Yudong, et al.
Published: (2025)
Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
by: Soleymani, Ashkan, et al.
Published: (2025)
by: Soleymani, Ashkan, et al.
Published: (2025)
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
by: Zhang, Haihan, et al.
Published: (2024)
by: Zhang, Haihan, et al.
Published: (2024)
Accelerating Single-Pass SGD for Generalized Linear Prediction
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
Gradient descent in matrix factorization: Understanding large initialization
by: Chen, Hengchao, et al.
Published: (2023)
by: Chen, Hengchao, et al.
Published: (2023)
Variance Reduction and Low Sample Complexity in Stochastic Optimization via Proximal Point Method
by: Liang, Jiaming
Published: (2024)
by: Liang, Jiaming
Published: (2024)
A Unified Analysis for Finite Weight Averaging
by: Wang, Peng, et al.
Published: (2024)
by: Wang, Peng, et al.
Published: (2024)
Diffusion Model for Data-Driven Black-Box Optimization
by: Li, Zihao, et al.
Published: (2024)
by: Li, Zihao, et al.
Published: (2024)
Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight
by: Hwang, Wen-Liang
Published: (2024)
by: Hwang, Wen-Liang
Published: (2024)
Offline Policy Learning with Weight Clipping and Heaviside Composite Optimization
by: Liu, Jingren, et al.
Published: (2026)
by: Liu, Jingren, et al.
Published: (2026)
Trade-off in Estimating the Number of Byzantine Clients in Federated Learning
by: Chen, Ziyi, et al.
Published: (2025)
by: Chen, Ziyi, et al.
Published: (2025)
Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
by: Armacki, Aleksandar, et al.
Published: (2026)
by: Armacki, Aleksandar, et al.
Published: (2026)
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization
by: Liang, Ling, et al.
Published: (2024)
by: Liang, Ling, et al.
Published: (2024)
A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport
by: Liang, Ling, et al.
Published: (2026)
by: Liang, Ling, et al.
Published: (2026)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
by: Liang, Yifei, et al.
Published: (2026)
by: Liang, Yifei, et al.
Published: (2026)
A Hessian-Aware Stochastic Differential Equation for Modelling SGD
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning
by: Heng, Qiang, et al.
Published: (2025)
by: Heng, Qiang, et al.
Published: (2025)
An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport
by: Chen, Xiang, et al.
Published: (2024)
by: Chen, Xiang, et al.
Published: (2024)
From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)
by: Xie, Shengping, et al.
Published: (2025)
Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes
by: Huo, Dongyan, et al.
Published: (2022)
by: Huo, Dongyan, et al.
Published: (2022)
Langevin Multiplicative Weights Update with Applications in Polynomial Portfolio Management
by: Feng, Yi, et al.
Published: (2025)
by: Feng, Yi, et al.
Published: (2025)
Convergence of Sharpness-Aware Minimization Algorithms using Increasing Batch Size and Decaying Learning Rate
by: Harada, Hinata, et al.
Published: (2024)
by: Harada, Hinata, et al.
Published: (2024)
Similar Items
-
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
by: Chen, Lizhang, et al.
Published: (2023) -
$ϕ$-Balancing for Mixture-of-Experts Training
by: Chen, Lizhang, et al.
Published: (2026) -
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025) -
Communication Efficient Distributed Training with Distributed Lion
by: Liu, Bo, et al.
Published: (2024) -
Cautious Optimizers: Improving Training with One Line of Code
by: Liang, Kaizhao, et al.
Published: (2024)