:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Haoran, Wang, Wentao
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.25550
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SignSGD with Federated Voting
by: Park, Chanho, et al.
Published: (2024)

Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026)

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)

SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding
by: Park, Chanho, et al.
Published: (2024)

Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
by: Petrov, Egor, et al.
Published: (2025)

Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach
by: Kazemi, Amirreza, et al.
Published: (2026)

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)

Convergence Analysis of SGD under Expected Smoothness
by: Kawamoto, Yuta, et al.
Published: (2025)

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
by: Sha, Haichao, et al.
Published: (2023)

Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration
by: Garg, Sachin, et al.
Published: (2026)

Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
by: Jin, Richeng, et al.
Published: (2020)

Sign-SGD via Parameter-Free Optimization
by: Medyakov, Daniil, et al.
Published: (2025)

Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
by: Wang, Jiayi, et al.
Published: (2020)

From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression
by: Chen, Ziyan, et al.
Published: (2026)

On the Convergence of DP-SGD with Adaptive Clipping
by: Shulgin, Egor, et al.
Published: (2024)

Error estimates between SGD with momentum and underdamped Langevin diffusion
by: Guillin, Arnaud, et al.
Published: (2024)

SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam
by: Peng, Hanyang, et al.
Published: (2025)

The Marginal Value of Momentum for Small Learning Rate SGD
by: Wang, Runzhe, et al.
Published: (2023)

VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
by: Chen, Jiahe, et al.
Published: (2025)

SGD for Variational Inference: Tackling Unbounded Variance via Preconditioning and Dynamic Batching
by: Labarrière, Hippolyte, et al.
Published: (2026)

Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
by: Wagner, Dominik, et al.
Published: (2024)

Faster Convergence of Local SGD for Over-Parameterized Models
by: Qin, Tiancheng, et al.
Published: (2022)

Global Convergence of SGD On Two Layer Neural Nets
by: Gopalani, Pulkit, et al.
Published: (2022)

High-Probability Convergence Guarantees of Decentralized SGD
by: Armacki, Aleksandar, et al.
Published: (2025)

A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
by: Jin, Ruinan, et al.
Published: (2024)

Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity
by: Umeda, Hikaru, et al.
Published: (2025)

Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning
by: Ayoub, Alex, et al.
Published: (2024)

GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
by: Wang, Wentao, et al.
Published: (2026)

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
by: Attia, Amit, et al.
Published: (2025)

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
by: Chien, Eli, et al.
Published: (2024)

Online Linear Programming with Batching
by: Xu, Haoran, et al.
Published: (2024)

Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD
by: Dudukalov, Dmitry, et al.
Published: (2025)

Hybrid Unsupervised Learning Strategy for Monitoring Industrial Batch Processes
by: Frey, Christian W.
Published: (2024)

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)

Noise is All You Need: Private Second-Order Convergence of Noisy SGD
by: Avdiukhin, Dmitrii, et al.
Published: (2024)

Convergence Bound and Critical Batch Size of Muon Optimizer
by: Sato, Naoki, et al.
Published: (2025)

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses
by: Tanguy, Eloi
Published: (2023)