Saved in:
| Main Authors: | Chen, Haoran, Wang, Wentao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.25550 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SignSGD with Federated Voting
by: Park, Chanho, et al.
Published: (2024)
by: Park, Chanho, et al.
Published: (2024)
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)
by: Paquette, Elliot, et al.
Published: (2026)
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026)
by: Kim, Jihwan, et al.
Published: (2026)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding
by: Park, Chanho, et al.
Published: (2024)
by: Park, Chanho, et al.
Published: (2024)
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
by: Petrov, Egor, et al.
Published: (2025)
by: Petrov, Egor, et al.
Published: (2025)
Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach
by: Kazemi, Amirreza, et al.
Published: (2026)
by: Kazemi, Amirreza, et al.
Published: (2026)
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)
by: Tao, Hongyi, et al.
Published: (2026)
Convergence Analysis of SGD under Expected Smoothness
by: Kawamoto, Yuta, et al.
Published: (2025)
by: Kawamoto, Yuta, et al.
Published: (2025)
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)
by: Marek, Martin, et al.
Published: (2025)
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
by: Sha, Haichao, et al.
Published: (2023)
by: Sha, Haichao, et al.
Published: (2023)
Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
by: Garg, Sachin, et al.
Published: (2026)
by: Garg, Sachin, et al.
Published: (2026)
Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
by: Jin, Richeng, et al.
Published: (2020)
by: Jin, Richeng, et al.
Published: (2020)
Sign-SGD via Parameter-Free Optimization
by: Medyakov, Daniil, et al.
Published: (2025)
by: Medyakov, Daniil, et al.
Published: (2025)
Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
by: Wang, Jiayi, et al.
Published: (2020)
by: Wang, Jiayi, et al.
Published: (2020)
From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)
by: Xie, Shengping, et al.
Published: (2025)
From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression
by: Chen, Ziyan, et al.
Published: (2026)
by: Chen, Ziyan, et al.
Published: (2026)
On the Convergence of DP-SGD with Adaptive Clipping
by: Shulgin, Egor, et al.
Published: (2024)
by: Shulgin, Egor, et al.
Published: (2024)
Error estimates between SGD with momentum and underdamped Langevin diffusion
by: Guillin, Arnaud, et al.
Published: (2024)
by: Guillin, Arnaud, et al.
Published: (2024)
SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam
by: Peng, Hanyang, et al.
Published: (2025)
by: Peng, Hanyang, et al.
Published: (2025)
The Marginal Value of Momentum for Small Learning Rate SGD
by: Wang, Runzhe, et al.
Published: (2023)
by: Wang, Runzhe, et al.
Published: (2023)
VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
by: Chen, Jiahe, et al.
Published: (2025)
by: Chen, Jiahe, et al.
Published: (2025)
SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
by: Labarrière, Hippolyte, et al.
Published: (2026)
by: Labarrière, Hippolyte, et al.
Published: (2026)
Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
Faster Convergence of Local SGD for Over-Parameterized Models
by: Qin, Tiancheng, et al.
Published: (2022)
by: Qin, Tiancheng, et al.
Published: (2022)
Global Convergence of SGD On Two Layer Neural Nets
by: Gopalani, Pulkit, et al.
Published: (2022)
by: Gopalani, Pulkit, et al.
Published: (2022)
High-Probability Convergence Guarantees of Decentralized SGD
by: Armacki, Aleksandar, et al.
Published: (2025)
by: Armacki, Aleksandar, et al.
Published: (2025)
A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
by: Jin, Ruinan, et al.
Published: (2024)
by: Jin, Ruinan, et al.
Published: (2024)
Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity
by: Umeda, Hikaru, et al.
Published: (2025)
by: Umeda, Hikaru, et al.
Published: (2025)
Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning
by: Ayoub, Alex, et al.
Published: (2024)
by: Ayoub, Alex, et al.
Published: (2024)
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
by: Wang, Wentao, et al.
Published: (2026)
by: Wang, Wentao, et al.
Published: (2026)
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
by: Attia, Amit, et al.
Published: (2025)
by: Attia, Amit, et al.
Published: (2025)
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
by: Chien, Eli, et al.
Published: (2024)
by: Chien, Eli, et al.
Published: (2024)
Online Linear Programming with Batching
by: Xu, Haoran, et al.
Published: (2024)
by: Xu, Haoran, et al.
Published: (2024)
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
by: Dudukalov, Dmitry, et al.
Published: (2025)
by: Dudukalov, Dmitry, et al.
Published: (2025)
Hybrid Unsupervised Learning Strategy for Monitoring Industrial Batch Processes
by: Frey, Christian W.
Published: (2024)
by: Frey, Christian W.
Published: (2024)
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)
by: Kovačević, Filip, et al.
Published: (2026)
Noise is All You Need: Private Second-Order Convergence of Noisy SGD
by: Avdiukhin, Dmitrii, et al.
Published: (2024)
by: Avdiukhin, Dmitrii, et al.
Published: (2024)
Convergence Bound and Critical Batch Size of Muon Optimizer
by: Sato, Naoki, et al.
Published: (2025)
by: Sato, Naoki, et al.
Published: (2025)
Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses
by: Tanguy, Eloi
Published: (2023)
by: Tanguy, Eloi
Published: (2023)
Similar Items
-
SignSGD with Federated Voting
by: Park, Chanho, et al.
Published: (2024) -
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026) -
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026) -
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026) -
SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding
by: Park, Chanho, et al.
Published: (2024)