Saved in:
| Main Authors: | Xu, Yizhou, Beneventano, Pierfrancesco, Chuang, Isaac, Ziyin, Liu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.05065 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Trajectories of SGD Without Replacement
by: Beneventano, Pierfrancesco
Published: (2023)
by: Beneventano, Pierfrancesco
Published: (2023)
Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD
by: Andreyev, Arseniy, et al.
Published: (2024)
by: Andreyev, Arseniy, et al.
Published: (2024)
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
by: Beneventano, Pierfrancesco, et al.
Published: (2024)
by: Beneventano, Pierfrancesco, et al.
Published: (2024)
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
by: Beneventano, Pierfrancesco, et al.
Published: (2025)
by: Beneventano, Pierfrancesco, et al.
Published: (2025)
Does Weight Decay Enhance Training Stability?
by: Saether, Marius, et al.
Published: (2026)
by: Saether, Marius, et al.
Published: (2026)
Too Sharp, Too Sure: When Calibration Follows Curvature
by: Morosini, Alessandro, et al.
Published: (2026)
by: Morosini, Alessandro, et al.
Published: (2026)
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability
by: Andreyev, Arseniy, et al.
Published: (2026)
by: Andreyev, Arseniy, et al.
Published: (2026)
Do Deep Networks Forget Initialization? A Forgetting-Time View of Practical Inductive Bias
by: Das, Mohua, et al.
Published: (2026)
by: Das, Mohua, et al.
Published: (2026)
Does SGD really happen in tiny subspaces?
by: Song, Minhak, et al.
Published: (2024)
by: Song, Minhak, et al.
Published: (2024)
SGD at the Edge of Stability: The Stochastic Sharpness Gap
by: Liao, Fangshuo, et al.
Published: (2026)
by: Liao, Fangshuo, et al.
Published: (2026)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm
by: Li, Chris Junchi, et al.
Published: (2020)
by: Li, Chris Junchi, et al.
Published: (2020)
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026)
by: Kim, Jihwan, et al.
Published: (2026)
Schrödinger Bridge with Quadratic State Cost is Exactly Solvable
by: Teter, Alexis M. H., et al.
Published: (2024)
by: Teter, Alexis M. H., et al.
Published: (2024)
Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization
by: Armacki, Aleksandar, et al.
Published: (2025)
by: Armacki, Aleksandar, et al.
Published: (2025)
Weyl Calculus and Exactly Solvable Schrödinger Bridges with Quadratic State Cost
by: Teter, Alexis M. H., et al.
Published: (2024)
by: Teter, Alexis M. H., et al.
Published: (2024)
Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGD
by: Hu, Jie, et al.
Published: (2024)
by: Hu, Jie, et al.
Published: (2024)
SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
by: Dahan, Tehila, et al.
Published: (2023)
by: Dahan, Tehila, et al.
Published: (2023)
Faster Convergence of Local SGD for Over-Parameterized Models
by: Qin, Tiancheng, et al.
Published: (2022)
by: Qin, Tiancheng, et al.
Published: (2022)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
Making SGD Parameter-Free
by: Carmon, Yair, et al.
Published: (2022)
by: Carmon, Yair, et al.
Published: (2022)
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
by: Ziyin, Liu, et al.
Published: (2024)
by: Ziyin, Liu, et al.
Published: (2024)
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
by: Zhang, Haihan, et al.
Published: (2024)
by: Zhang, Haihan, et al.
Published: (2024)
A Hessian-Aware Stochastic Differential Equation for Modelling SGD
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)
by: Xie, Shengping, et al.
Published: (2025)
Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity
by: Tyurin, Alexander, et al.
Published: (2024)
by: Tyurin, Alexander, et al.
Published: (2024)
Dimension-adapted Momentum Outscales SGD
by: Ferbach, Damien, et al.
Published: (2025)
by: Ferbach, Damien, et al.
Published: (2025)
Heavy-Tail Phenomenon in Decentralized SGD
by: Gurbuzbalaban, Mert, et al.
Published: (2022)
by: Gurbuzbalaban, Mert, et al.
Published: (2022)
Demystifying SGD with Doubly Stochastic Gradients
by: Kim, Kyurae, et al.
Published: (2024)
by: Kim, Kyurae, et al.
Published: (2024)
Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)
by: Vasudeva, Bhavya, et al.
Published: (2025)
Sign-SGD via Parameter-Free Optimization
by: Medyakov, Daniil, et al.
Published: (2025)
by: Medyakov, Daniil, et al.
Published: (2025)
Can SGD Handle Heavy-Tailed Noise?
by: Fatkhullin, Ilyas, et al.
Published: (2025)
by: Fatkhullin, Ilyas, et al.
Published: (2025)
SGD with memory: fundamental properties and stochastic acceleration
by: Yarotsky, Dmitry, et al.
Published: (2024)
by: Yarotsky, Dmitry, et al.
Published: (2024)
Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
by: Ruan, Boyuan, et al.
Published: (2026)
by: Ruan, Boyuan, et al.
Published: (2026)
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
by: Zhu, Libin, et al.
Published: (2023)
by: Zhu, Libin, et al.
Published: (2023)
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)
by: Paquette, Elliot, et al.
Published: (2026)
Optimal Projection-Free Adaptive SGD for Matrix Optimization
by: Kovalev, Dmitry
Published: (2026)
by: Kovalev, Dmitry
Published: (2026)
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
by: Sahu, Sharan, et al.
Published: (2026)
by: Sahu, Sharan, et al.
Published: (2026)
Accelerating Single-Pass SGD for Generalized Linear Prediction
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
From Gradient Clipping to Normalization for Heavy Tailed SGD
by: Hübler, Florian, et al.
Published: (2024)
by: Hübler, Florian, et al.
Published: (2024)
Similar Items
-
On the Trajectories of SGD Without Replacement
by: Beneventano, Pierfrancesco
Published: (2023) -
Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD
by: Andreyev, Arseniy, et al.
Published: (2024) -
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
by: Beneventano, Pierfrancesco, et al.
Published: (2024) -
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
by: Beneventano, Pierfrancesco, et al.
Published: (2025) -
Does Weight Decay Enhance Training Stability?
by: Saether, Marius, et al.
Published: (2026)