:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autore principale:	Zhao, John
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Optimization and Control
Accesso online:	https://arxiv.org/abs/2601.01306
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Muon Optimizes Under Spectral Norm Constraints
di: Chen, Lizhang, et al.
Pubblicazione: (2025)

Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
di: Fan, Chen, et al.
Pubblicazione: (2025)

Muon Dynamics as a Spectral Wasserstein Flow
di: Peyré, Gabriel
Pubblicazione: (2026)

Preconditioning Benefits of Spectral Orthogonalization in Muon
di: Ma, Jianhao, et al.
Pubblicazione: (2026)

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws
di: Li, Binghui, et al.
Pubblicazione: (2026)

Phases of Muon: When Muon Eclipses SignSGD
di: Paquette, Elliot, et al.
Pubblicazione: (2026)

MuonBP: Faster Muon via Block-Periodic Orthogonalization
di: Khaled, Ahmed, et al.
Pubblicazione: (2025)

LiMuon: Light and Fast Muon Optimizer for Large Models
di: Huang, Feihu, et al.
Pubblicazione: (2025)

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization
di: Chen, Zixiang, et al.
Pubblicazione: (2025)

Convergence of Muon with Newton-Schulz
di: Kim, Gyu Yeol, et al.
Pubblicazione: (2026)

Error Feedback for Muon and Friends
di: Gruntkowska, Kaja, et al.
Pubblicazione: (2025)

Muon Converges under Heavy-Tailed Noise: Nonconvex Hölder-Smooth Empirical Risk Minimization
di: Iiduka, Hideaki
Pubblicazione: (2026)

Insights on Muon from Simple Quadratics
di: Gonon, Antoine, et al.
Pubblicazione: (2026)

Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training
di: Southworth, Ben S., et al.
Pubblicazione: (2026)

Muon Does Not Converge on Convex Lipschitz Functions
di: Parshakova, Tetiana, et al.
Pubblicazione: (2026)

Muon is Provably Faster with Momentum Variance Reduction
di: Qian, Xun, et al.
Pubblicazione: (2025)

Drop-Muon: Update Less, Converge Faster
di: Gruntkowska, Kaja, et al.
Pubblicazione: (2025)

Beyond the Ideal: Analyzing the Inexact Muon Update
di: Shulgin, Egor, et al.
Pubblicazione: (2025)

Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization
di: Nagashima, Shuntaro, et al.
Pubblicazione: (2026)

Lions and Muons: Optimization via Stochastic Frank-Wolfe
di: Sfyraki, Maria-Eleni, et al.
Pubblicazione: (2025)

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
di: Huang, Feihu, et al.
Pubblicazione: (2026)

The Newton-Muon Optimizer
di: Du, Zhehang, et al.
Pubblicazione: (2026)

On the Convergence Analysis of Muon
di: Shen, Wei, et al.
Pubblicazione: (2025)

AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
di: Zhang, Minxin, et al.
Pubblicazione: (2025)

Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability
di: Wang, Jingzhe, et al.
Pubblicazione: (2026)

FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
di: Takezawa, Yuki, et al.
Pubblicazione: (2025)

Muon with Nesterov Momentum: Heavy-Tailed Noise and (Randomized) Inexact Polar Decomposition
di: Choudhury, Sayantan, et al.
Pubblicazione: (2026)

Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
di: Petrov, Egor, et al.
Pubblicazione: (2025)

DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
di: He, Chuan, et al.
Pubblicazione: (2025)

Model-Free $μ$-Synthesis: A Nonsmooth Optimization Perspective
di: Keivan, Darioush, et al.
Pubblicazione: (2024)

$μ^2$-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism
di: Dahan, Tehila, et al.
Pubblicazione: (2023)

Towards an Optimal Control Perspective of ResNet Training
di: Püttschneider, Jens, et al.
Pubblicazione: (2025)

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
di: Riabinin, Artem, et al.
Pubblicazione: (2025)

Faster Stochastic Algorithms for Minimax Optimization under Polyak--Łojasiewicz Conditions
di: Chen, Lesi, et al.
Pubblicazione: (2023)

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition
di: Bai, Yunyan, et al.
Pubblicazione: (2024)

High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
di: Kar, Avik, et al.
Pubblicazione: (2026)

Anytime Training with Schedule-Free Spectral Optimization
di: Apte, Anuj, et al.
Pubblicazione: (2026)

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale
di: Nagwekar, Ansh
Pubblicazione: (2025)

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
di: Wang, Jinbo, et al.
Pubblicazione: (2025)

Towards a Systems Theory of Algorithms
di: Dörfler, Florian, et al.
Pubblicazione: (2024)