:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yizhou, Beneventano, Pierfrancesco, Chuang, Isaac, Ziyin, Liu
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2602.05065
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Trajectories of SGD Without Replacement
by: Beneventano, Pierfrancesco
Published: (2023)

Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD
by: Andreyev, Arseniy, et al.
Published: (2024)

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
by: Beneventano, Pierfrancesco, et al.
Published: (2024)

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
by: Beneventano, Pierfrancesco, et al.
Published: (2025)

Does Weight Decay Enhance Training Stability?
by: Saether, Marius, et al.
Published: (2026)

Too Sharp, Too Sure: When Calibration Follows Curvature
by: Morosini, Alessandro, et al.
Published: (2026)

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability
by: Andreyev, Arseniy, et al.
Published: (2026)

Do Deep Networks Forget Initialization? A Forgetting-Time View of Practical Inductive Bias
by: Das, Mohua, et al.
Published: (2026)

Does SGD really happen in tiny subspaces?
by: Song, Minhak, et al.
Published: (2024)

SGD at the Edge of Stability: The Stochastic Sharpness Gap
by: Liao, Fangshuo, et al.
Published: (2026)

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm
by: Li, Chris Junchi, et al.
Published: (2020)

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026)

Schrödinger Bridge with Quadratic State Cost is Exactly Solvable
by: Teter, Alexis M. H., et al.
Published: (2024)

Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization
by: Armacki, Aleksandar, et al.
Published: (2025)

Weyl Calculus and Exactly Solvable Schrödinger Bridges with Quadratic State Cost
by: Teter, Alexis M. H., et al.
Published: (2024)

Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGD
by: Hu, Jie, et al.
Published: (2024)

SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
by: Dahan, Tehila, et al.
Published: (2023)

Faster Convergence of Local SGD for Over-Parameterized Models
by: Qin, Tiancheng, et al.
Published: (2022)

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)

Making SGD Parameter-Free
by: Carmon, Yair, et al.
Published: (2022)

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
by: Ziyin, Liu, et al.
Published: (2024)

The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
by: Zhang, Haihan, et al.
Published: (2024)

A Hessian-Aware Stochastic Differential Equation for Modelling SGD
by: Li, Xiang, et al.
Published: (2024)

From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)

Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity
by: Tyurin, Alexander, et al.
Published: (2024)

Dimension-adapted Momentum Outscales SGD
by: Ferbach, Damien, et al.
Published: (2025)

Heavy-Tail Phenomenon in Decentralized SGD
by: Gurbuzbalaban, Mert, et al.
Published: (2022)

Demystifying SGD with Doubly Stochastic Gradients
by: Kim, Kyurae, et al.
Published: (2024)

Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
by: Wagner, Dominik, et al.
Published: (2024)

The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)

Sign-SGD via Parameter-Free Optimization
by: Medyakov, Daniil, et al.
Published: (2025)

Can SGD Handle Heavy-Tailed Noise?
by: Fatkhullin, Ilyas, et al.
Published: (2025)

SGD with memory: fundamental properties and stochastic acceleration
by: Yarotsky, Dmitry, et al.
Published: (2024)

Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
by: Ruan, Boyuan, et al.
Published: (2026)

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
by: Zhu, Libin, et al.
Published: (2023)

Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)

Optimal Projection-Free Adaptive SGD for Matrix Optimization
by: Kovalev, Dmitry
Published: (2026)

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
by: Sahu, Sharan, et al.
Published: (2026)

Accelerating Single-Pass SGD for Generalized Linear Prediction
by: Chen, Qian, et al.
Published: (2026)

From Gradient Clipping to Normalization for Heavy Tailed SGD
by: Hübler, Florian, et al.
Published: (2024)