Saved in:
| Main Authors: | Wang, Lawrence, Roberts, Stephen J. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.17613 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Training Instabilities Induce Flatness Bias in Gradient Descent
by: Wang, Lawrence, et al.
Published: (2025)
by: Wang, Lawrence, et al.
Published: (2025)
Can Gradient Descent Simulate Prompting?
by: Zhang, Eric, et al.
Published: (2025)
by: Zhang, Eric, et al.
Published: (2025)
Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
by: Gan, Eric
Published: (2026)
by: Gan, Eric
Published: (2026)
Streaming Krylov-Accelerated Stochastic Gradient Descent
by: Thomas, Stephen
Published: (2025)
by: Thomas, Stephen
Published: (2025)
Understanding Gradient Descent through the Training Jacobian
by: Belrose, Nora, et al.
Published: (2024)
by: Belrose, Nora, et al.
Published: (2024)
Can LLMs predict the convergence of Stochastic Gradient Descent?
by: Zekri, Oussama, et al.
Published: (2024)
by: Zekri, Oussama, et al.
Published: (2024)
On the Generalization of Stochastic Gradient Descent with Momentum
by: Ramezani-Kebrya, Ali, et al.
Published: (2018)
by: Ramezani-Kebrya, Ali, et al.
Published: (2018)
Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models
by: Chhabra, Anshuman, et al.
Published: (2024)
by: Chhabra, Anshuman, et al.
Published: (2024)
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
by: Wang, Puyu, et al.
Published: (2023)
by: Wang, Puyu, et al.
Published: (2023)
Occam Gradient Descent
by: Kausik, B. N.
Published: (2024)
by: Kausik, B. N.
Published: (2024)
Gradient Descent Algorithm Survey
by: Fucheng, Deng, et al.
Published: (2025)
by: Fucheng, Deng, et al.
Published: (2025)
Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
by: Ma, Wenquan, et al.
Published: (2026)
by: Ma, Wenquan, et al.
Published: (2026)
First-ish Order Methods: Hessian-aware Scalings of Gradient Descent
by: Smee, Oscar, et al.
Published: (2025)
by: Smee, Oscar, et al.
Published: (2025)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
by: Gatmiry, Khashayar, et al.
Published: (2024)
by: Gatmiry, Khashayar, et al.
Published: (2024)
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
by: Chemnitz, Dennis, et al.
Published: (2024)
by: Chemnitz, Dennis, et al.
Published: (2024)
Generalized Gradient Descent is a Hypergraph Functor
by: Hanks, Tyler, et al.
Published: (2024)
by: Hanks, Tyler, et al.
Published: (2024)
Neutron Reflectometry by Gradient Descent
by: Champneys, Max D., et al.
Published: (2025)
by: Champneys, Max D., et al.
Published: (2025)
Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares
by: MacDonald, Lachlan Ewen, et al.
Published: (2025)
by: MacDonald, Lachlan Ewen, et al.
Published: (2025)
Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization
by: Guzmán-Cordero, Andrés, et al.
Published: (2025)
by: Guzmán-Cordero, Andrés, et al.
Published: (2025)
Stochastic Adaptive Gradient Descent Without Descent
by: Aujol, Jean-François, et al.
Published: (2025)
by: Aujol, Jean-François, et al.
Published: (2025)
Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster
by: Vaswani, Sharan, et al.
Published: (2025)
by: Vaswani, Sharan, et al.
Published: (2025)
Thermodynamic Natural Gradient Descent
by: Donatella, Kaelan, et al.
Published: (2024)
by: Donatella, Kaelan, et al.
Published: (2024)
Stacking as Accelerated Gradient Descent
by: Agarwal, Naman, et al.
Published: (2024)
by: Agarwal, Naman, et al.
Published: (2024)
Corner Gradient Descent
by: Yarotsky, Dmitry
Published: (2025)
by: Yarotsky, Dmitry
Published: (2025)
Adjacent Leader Decentralized Stochastic Gradient Descent
by: He, Haoze, et al.
Published: (2024)
by: He, Haoze, et al.
Published: (2024)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)
by: Li, Bingrui, et al.
Published: (2024)
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
by: Zhang, Chenyang, et al.
Published: (2026)
by: Zhang, Chenyang, et al.
Published: (2026)
Controlling the Flow: Stability and Convergence for Stochastic Gradient Descent with Decaying Regularization
by: Kassing, Sebastian, et al.
Published: (2025)
by: Kassing, Sebastian, et al.
Published: (2025)
Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians
by: Engelken, Rainer
Published: (2023)
by: Engelken, Rainer
Published: (2023)
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
by: Ziyin, Liu, et al.
Published: (2023)
by: Ziyin, Liu, et al.
Published: (2023)
Robust Gradient Descent for Phase Retrieval
by: Buna, Alex, et al.
Published: (2024)
by: Buna, Alex, et al.
Published: (2024)
Distributed Gradient Descent for Functional Learning
by: Yu, Zhan, et al.
Published: (2023)
by: Yu, Zhan, et al.
Published: (2023)
Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification
by: Li, Yuanfan, et al.
Published: (2025)
by: Li, Yuanfan, et al.
Published: (2025)
Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification
by: Schliserman, Matan, et al.
Published: (2025)
by: Schliserman, Matan, et al.
Published: (2025)
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates
by: Goyal, Saumya, et al.
Published: (2026)
by: Goyal, Saumya, et al.
Published: (2026)
The Implicit Bias of Gradient Descent on Separable Multiclass Data
by: Ravi, Hrithik, et al.
Published: (2024)
by: Ravi, Hrithik, et al.
Published: (2024)
Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
by: Dang, Thanh, et al.
Published: (2025)
by: Dang, Thanh, et al.
Published: (2025)
Adaptive Conditional Gradient Descent
by: Khademi, Abbas, et al.
Published: (2025)
by: Khademi, Abbas, et al.
Published: (2025)
$k$-SVD with Gradient Descent
by: Jedra, Yassir, et al.
Published: (2025)
by: Jedra, Yassir, et al.
Published: (2025)
Similar Items
-
Training Instabilities Induce Flatness Bias in Gradient Descent
by: Wang, Lawrence, et al.
Published: (2025) -
Can Gradient Descent Simulate Prompting?
by: Zhang, Eric, et al.
Published: (2025) -
Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
by: Gan, Eric
Published: (2026) -
Streaming Krylov-Accelerated Stochastic Gradient Descent
by: Thomas, Stephen
Published: (2025) -
Understanding Gradient Descent through the Training Jacobian
by: Belrose, Nora, et al.
Published: (2024)