Saved in:
| Main Authors: | Marcotte, Sibylle, Gribonval, Rémi, Peyré, Gabriel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.12888 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2023)
by: Marcotte, Sibylle, et al.
Published: (2023)
Transformative or Conservative? Conservation laws for ResNets and Transformers
by: Marcotte, Sibylle, et al.
Published: (2025)
by: Marcotte, Sibylle, et al.
Published: (2025)
Intrinsic training dynamics of deep neural networks
by: Marcotte, Sibylle, et al.
Published: (2025)
by: Marcotte, Sibylle, et al.
Published: (2025)
Muon Dynamics as a Spectral Wasserstein Flow
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
Path-conditioned training: a principled way to rescale ReLU neural networks
by: Lebeurrier, Arthur, et al.
Published: (2026)
by: Lebeurrier, Arthur, et al.
Published: (2026)
Robust Sublinear Convergence Rates for Iterative Bregman Projections
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
Optimal and Diffusion Transports in Machine Learning
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
Optimal Transport for Machine Learners
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
by: Petit, Romain, et al.
Published: (2026)
by: Petit, Romain, et al.
Published: (2026)
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
by: Barboni, Raphaël, et al.
Published: (2024)
by: Barboni, Raphaël, et al.
Published: (2024)
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
by: Barboni, Raphaël, et al.
Published: (2025)
by: Barboni, Raphaël, et al.
Published: (2025)
Shuffling Momentum Gradient Algorithm for Convex Optimization
by: Tran, Trang H., et al.
Published: (2024)
by: Tran, Trang H., et al.
Published: (2024)
Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Adaptive Optimization via Momentum on Variance-Normalized Gradients
by: Patitucci, Francisco, et al.
Published: (2026)
by: Patitucci, Francisco, et al.
Published: (2026)
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
by: Sato, Naoki, et al.
Published: (2024)
by: Sato, Naoki, et al.
Published: (2024)
Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting
by: Hurault, Samuel, et al.
Published: (2025)
by: Hurault, Samuel, et al.
Published: (2025)
Training Infinitely Deep and Wide Transformers
by: Barboni, Raphaël, et al.
Published: (2026)
by: Barboni, Raphaël, et al.
Published: (2026)
Policy Gradient with Second Order Momentum
by: Sun, Tianyu
Published: (2025)
by: Sun, Tianyu
Published: (2025)
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
by: Feng, Jie, et al.
Published: (2024)
by: Feng, Jie, et al.
Published: (2024)
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
by: Phunyaphibarn, Prin, et al.
Published: (2023)
by: Phunyaphibarn, Prin, et al.
Published: (2023)
First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms
by: Lu, Eric
Published: (2025)
by: Lu, Eric
Published: (2025)
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
by: Kovalev, Dmitry
Published: (2025)
by: Kovalev, Dmitry
Published: (2025)
Adaptive Momentum and Nonlinear Damping for Neural Network Training
by: Karoni, Aikaterini, et al.
Published: (2026)
by: Karoni, Aikaterini, et al.
Published: (2026)
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law
by: Kunstner, Frederik, et al.
Published: (2025)
by: Kunstner, Frederik, et al.
Published: (2025)
Neighbor-Sampling Based Momentum Stochastic Methods for Training Graph Neural Networks
by: Noel, Molly, et al.
Published: (2025)
by: Noel, Molly, et al.
Published: (2025)
GANs as Gradient Flows that Converge
by: Huang, Yu-Jui, et al.
Published: (2022)
by: Huang, Yu-Jui, et al.
Published: (2022)
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
by: Bonet, Clément, et al.
Published: (2025)
by: Bonet, Clément, et al.
Published: (2025)
Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
by: Dang, Thanh, et al.
Published: (2025)
by: Dang, Thanh, et al.
Published: (2025)
Safe Gradient Flow for Bilevel Optimization
by: Sharifi, Sina, et al.
Published: (2025)
by: Sharifi, Sina, et al.
Published: (2025)
WeightLoRA: Keep Only Necessary Adapters
by: Veprikov, Andrey, et al.
Published: (2025)
by: Veprikov, Andrey, et al.
Published: (2025)
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
by: Beneventano, Pierfrancesco, et al.
Published: (2025)
by: Beneventano, Pierfrancesco, et al.
Published: (2025)
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
by: Cattaneo, Matias D., et al.
Published: (2025)
by: Cattaneo, Matias D., et al.
Published: (2025)
Improving Stochastic Cubic Newton with Momentum
by: Chayti, El Mahdi, et al.
Published: (2024)
by: Chayti, El Mahdi, et al.
Published: (2024)
Stochastic Difference-of-Convex Optimization with Momentum
by: Chayti, El Mahdi, et al.
Published: (2025)
by: Chayti, El Mahdi, et al.
Published: (2025)
Dimension-adapted Momentum Outscales SGD
by: Ferbach, Damien, et al.
Published: (2025)
by: Ferbach, Damien, et al.
Published: (2025)
Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow
by: Ren, Yinuo, et al.
Published: (2023)
by: Ren, Yinuo, et al.
Published: (2023)
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
by: Varre, Aditya, et al.
Published: (2026)
by: Varre, Aditya, et al.
Published: (2026)
Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points
by: Yamamoto, Naoya, et al.
Published: (2025)
by: Yamamoto, Naoya, et al.
Published: (2025)
Grams: Gradient Descent with Adaptive Momentum Scaling
by: Cao, Yang, et al.
Published: (2024)
by: Cao, Yang, et al.
Published: (2024)
Similar Items
-
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2023) -
Transformative or Conservative? Conservation laws for ResNets and Transformers
by: Marcotte, Sibylle, et al.
Published: (2025) -
Intrinsic training dynamics of deep neural networks
by: Marcotte, Sibylle, et al.
Published: (2025) -
Muon Dynamics as a Spectral Wasserstein Flow
by: Peyré, Gabriel
Published: (2026) -
Path-conditioned training: a principled way to rescale ReLU neural networks
by: Lebeurrier, Arthur, et al.
Published: (2026)