:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Shen-Yi, Shi, Chang-Wei, Xie, Yin-Peng, Li, Wu-Jun
Format:	Preprint
Published:	2020
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2007.13985
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
by: Kamo, Keisuke, et al.
Published: (2025)

Global Momentum Compression for Sparse Communication in Distributed Learning
by: Shi, Chang-Wei, et al.
Published: (2019)

On the Generalization of Stochastic Gradient Descent with Momentum
by: Ramezani-Kebrya, Ali, et al.
Published: (2018)

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
by: Wei, Ziyang, et al.
Published: (2023)

Stochastic Gradient Descent with Momentum is Algorithmically Stable
by: Lei, Yunwen, et al.
Published: (2026)

Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
by: Sato, Naoki, et al.
Published: (2024)

Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent
by: Naganuma, Hiroki, et al.
Published: (2026)

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size
by: Oowada, Kanata, et al.
Published: (2025)

Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training
by: Lu, Yishun, et al.
Published: (2025)

Ordered Momentum for Asynchronous SGD
by: Shi, Chang-Wei, et al.
Published: (2024)

Central Limit Theorems for Stochastic Gradient Descent Quantile Estimators
by: Wei, Ziyang, et al.
Published: (2025)

Grams: Gradient Descent with Adaptive Momentum Scaling
by: Cao, Yang, et al.
Published: (2024)

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
by: Li, Jiaqi, et al.
Published: (2025)

Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
by: Umeda, Hikaru, et al.
Published: (2024)

Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
by: Qi, Xuan, et al.
Published: (2026)

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
by: Deng, Xiaoge, et al.
Published: (2023)

First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms
by: Lu, Eric
Published: (2025)

Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
by: Dang, Thanh, et al.
Published: (2025)

Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization
by: Shen, Wei, et al.
Published: (2023)

Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays
by: Shi, Chang-Wei, et al.
Published: (2026)

Stochastic Gradient Descent for Two-layer Neural Networks
by: Cao, Dinghao, et al.
Published: (2024)

Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent
by: Li, Xiang, et al.
Published: (2024)

Gradient Descent, Stochastic Optimization, and Other Tales
by: Lu, Jun
Published: (2022)

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction
by: Wei, Ziyang, et al.
Published: (2026)

Stochastic Adaptive Gradient Descent Without Descent
by: Aujol, Jean-François, et al.
Published: (2025)

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
by: Phunyaphibarn, Prin, et al.
Published: (2023)

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
by: Ziyin, Liu, et al.
Published: (2024)

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
by: Chang, Xiangyu, et al.
Published: (2022)

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization
by: Liu, Wei, et al.
Published: (2025)

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity
by: Umeda, Hikaru, et al.
Published: (2025)

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
by: Wang, Mingze, et al.
Published: (2023)

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training
by: Li, Zhiyuan, et al.
Published: (2026)

Distributed Gradient Descent for Functional Learning
by: Yu, Zhan, et al.
Published: (2023)

Trustworthiness of Stochastic Gradient Descent in Distributed Learning
by: Li, Hongyang, et al.
Published: (2024)

Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
by: Guzmán-Cordero, Andrés, et al.
Published: (2025)

Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation
by: Ali, Sarwan
Published: (2025)

Stochastic Gradient Descent with Adaptive Data
by: Che, Ethan, et al.
Published: (2024)

Stochastic Gradient Descent with Strategic Querying
by: Jiang, Nanfei, et al.
Published: (2025)

Adjacent Leader Decentralized Stochastic Gradient Descent
by: He, Haoze, et al.
Published: (2024)