Saved in:
| Main Authors: | Zhao, Shen-Yi, Shi, Chang-Wei, Xie, Yin-Peng, Li, Wu-Jun |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2007.13985 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
by: Kamo, Keisuke, et al.
Published: (2025)
by: Kamo, Keisuke, et al.
Published: (2025)
Global Momentum Compression for Sparse Communication in Distributed Learning
by: Shi, Chang-Wei, et al.
Published: (2019)
by: Shi, Chang-Wei, et al.
Published: (2019)
On the Generalization of Stochastic Gradient Descent with Momentum
by: Ramezani-Kebrya, Ali, et al.
Published: (2018)
by: Ramezani-Kebrya, Ali, et al.
Published: (2018)
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
by: Wei, Ziyang, et al.
Published: (2023)
by: Wei, Ziyang, et al.
Published: (2023)
Stochastic Gradient Descent with Momentum is Algorithmically Stable
by: Lei, Yunwen, et al.
Published: (2026)
by: Lei, Yunwen, et al.
Published: (2026)
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
by: Sato, Naoki, et al.
Published: (2024)
by: Sato, Naoki, et al.
Published: (2024)
Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent
by: Naganuma, Hiroki, et al.
Published: (2026)
by: Naganuma, Hiroki, et al.
Published: (2026)
Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size
by: Oowada, Kanata, et al.
Published: (2025)
by: Oowada, Kanata, et al.
Published: (2025)
Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training
by: Lu, Yishun, et al.
Published: (2025)
by: Lu, Yishun, et al.
Published: (2025)
Ordered Momentum for Asynchronous SGD
by: Shi, Chang-Wei, et al.
Published: (2024)
by: Shi, Chang-Wei, et al.
Published: (2024)
Central Limit Theorems for Stochastic Gradient Descent Quantile Estimators
by: Wei, Ziyang, et al.
Published: (2025)
by: Wei, Ziyang, et al.
Published: (2025)
Grams: Gradient Descent with Adaptive Momentum Scaling
by: Cao, Yang, et al.
Published: (2024)
by: Cao, Yang, et al.
Published: (2024)
Statistical Guarantees for High-Dimensional Stochastic Gradient Descent
by: Li, Jiaqi, et al.
Published: (2025)
by: Li, Jiaqi, et al.
Published: (2025)
Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent
by: Umeda, Hikaru, et al.
Published: (2024)
by: Umeda, Hikaru, et al.
Published: (2024)
Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
by: Qi, Xuan, et al.
Published: (2026)
by: Qi, Xuan, et al.
Published: (2026)
Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
by: Deng, Xiaoge, et al.
Published: (2023)
by: Deng, Xiaoge, et al.
Published: (2023)
First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms
by: Lu, Eric
Published: (2025)
by: Lu, Eric
Published: (2025)
Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
by: Dang, Thanh, et al.
Published: (2025)
by: Dang, Thanh, et al.
Published: (2025)
Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization
by: Shen, Wei, et al.
Published: (2023)
by: Shen, Wei, et al.
Published: (2023)
Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays
by: Shi, Chang-Wei, et al.
Published: (2026)
by: Shi, Chang-Wei, et al.
Published: (2026)
Stochastic Gradient Descent for Two-layer Neural Networks
by: Cao, Dinghao, et al.
Published: (2024)
by: Cao, Dinghao, et al.
Published: (2024)
Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Gradient Descent, Stochastic Optimization, and Other Tales
by: Lu, Jun
Published: (2022)
by: Lu, Jun
Published: (2022)
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)
by: Kovačević, Filip, et al.
Published: (2026)
Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction
by: Wei, Ziyang, et al.
Published: (2026)
by: Wei, Ziyang, et al.
Published: (2026)
Stochastic Adaptive Gradient Descent Without Descent
by: Aujol, Jean-François, et al.
Published: (2025)
by: Aujol, Jean-François, et al.
Published: (2025)
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
by: Phunyaphibarn, Prin, et al.
Published: (2023)
by: Phunyaphibarn, Prin, et al.
Published: (2023)
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
by: Ziyin, Liu, et al.
Published: (2024)
by: Ziyin, Liu, et al.
Published: (2024)
Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
by: Chang, Xiangyu, et al.
Published: (2022)
by: Chang, Xiangyu, et al.
Published: (2022)
Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity
by: Umeda, Hikaru, et al.
Published: (2025)
by: Umeda, Hikaru, et al.
Published: (2025)
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
by: Wang, Mingze, et al.
Published: (2023)
by: Wang, Mingze, et al.
Published: (2023)
AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training
by: Li, Zhiyuan, et al.
Published: (2026)
by: Li, Zhiyuan, et al.
Published: (2026)
Distributed Gradient Descent for Functional Learning
by: Yu, Zhan, et al.
Published: (2023)
by: Yu, Zhan, et al.
Published: (2023)
Trustworthiness of Stochastic Gradient Descent in Distributed Learning
by: Li, Hongyang, et al.
Published: (2024)
by: Li, Hongyang, et al.
Published: (2024)
Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
by: Guzmán-Cordero, Andrés, et al.
Published: (2025)
by: Guzmán-Cordero, Andrés, et al.
Published: (2025)
Robust Gradient Descent via Heavy-Ball Momentum with Predictive Extrapolation
by: Ali, Sarwan
Published: (2025)
by: Ali, Sarwan
Published: (2025)
Stochastic Gradient Descent with Adaptive Data
by: Che, Ethan, et al.
Published: (2024)
by: Che, Ethan, et al.
Published: (2024)
Stochastic Gradient Descent with Strategic Querying
by: Jiang, Nanfei, et al.
Published: (2025)
by: Jiang, Nanfei, et al.
Published: (2025)
Adjacent Leader Decentralized Stochastic Gradient Descent
by: He, Haoze, et al.
Published: (2024)
by: He, Haoze, et al.
Published: (2024)
Similar Items
-
Increasing Batch Size Improves Convergence of Stochastic Gradient Descent with Momentum
by: Kamo, Keisuke, et al.
Published: (2025) -
Global Momentum Compression for Sparse Communication in Distributed Learning
by: Shi, Chang-Wei, et al.
Published: (2019) -
On the Generalization of Stochastic Gradient Descent with Momentum
by: Ramezani-Kebrya, Ali, et al.
Published: (2018) -
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
by: Wei, Ziyang, et al.
Published: (2023) -
Stochastic Gradient Descent with Momentum is Algorithmically Stable
by: Lei, Yunwen, et al.
Published: (2026)