Saved in:
| Main Authors: | Compagnoni, Enea Monzio, Liu, Tianlin, Islamov, Rustem, Proske, Frank Norbert, Orvieto, Antonio, Lucchi, Aurelien |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.15958 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026)
by: Compagnoni, Enea Monzio, et al.
Published: (2026)
Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Why Do We Need Warm-up? A Theoretical Perspective
by: Alimisis, Foivos, et al.
Published: (2025)
by: Alimisis, Foivos, et al.
Published: (2025)
On the Role of Batch Size in Stochastic Conditional Gradient Methods
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)
by: Singh, Jaisidh, et al.
Published: (2026)
Towards Faster Decentralized Stochastic Optimization with Communication Compression
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)
by: Orvieto, Antonio, et al.
Published: (2024)
Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)
by: Zhao, Jim, et al.
Published: (2024)
A Theoretical Analysis of the Learning Dynamics under Class Imbalance
by: Francazi, Emanuele, et al.
Published: (2022)
by: Francazi, Emanuele, et al.
Published: (2022)
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)
by: Movahedi, Sajad, et al.
Published: (2024)
Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)
by: Laing, Sam, et al.
Published: (2025)
Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)
by: Okpekpe, Destiny, et al.
Published: (2025)
Theoretical Foundations of Deep Selective State-Space Models
by: Cirone, Nicola Muca, et al.
Published: (2024)
by: Cirone, Nicola Muca, et al.
Published: (2024)
In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)
by: Orvieto, Antonio, et al.
Published: (2025)
Cubic regularized subspace Newton for non-convex optimization
by: Zhao, Jim, et al.
Published: (2024)
by: Zhao, Jim, et al.
Published: (2024)
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)
by: Zucchet, Nicolas, et al.
Published: (2024)
An Uncertainty Principle for Linear Recurrent Neural Networks
by: François, Alexandre, et al.
Published: (2025)
by: François, Alexandre, et al.
Published: (2025)
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)
by: Belloni, Annalisa, et al.
Published: (2026)
Improved state mixing in higher-order and block diagonal linear recurrent networks
by: Dubinin, Igor, et al.
Published: (2026)
by: Dubinin, Igor, et al.
Published: (2026)
When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)
by: Ajroldi, Niccolò, et al.
Published: (2025)
Initial Guessing Bias: How Untrained Networks Favor Some Classes
by: Francazi, Emanuele, et al.
Published: (2023)
by: Francazi, Emanuele, et al.
Published: (2023)
Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
by: Cheng, Tin Sum, et al.
Published: (2024)
by: Cheng, Tin Sum, et al.
Published: (2024)
Optimizer choice matters for the emergence of Neural Collapse
by: Zhao, Jim, et al.
Published: (2026)
by: Zhao, Jim, et al.
Published: (2026)
A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression
by: Cheng, Tin Sum, et al.
Published: (2024)
by: Cheng, Tin Sum, et al.
Published: (2024)
Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding
by: Takhanov, Rustem, et al.
Published: (2026)
by: Takhanov, Rustem, et al.
Published: (2026)
On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)
by: Takhanov, Rustem
Published: (2026)
The informativeness of the gradient revisited
by: Takhanov, Rustem
Published: (2025)
by: Takhanov, Rustem
Published: (2025)
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)
by: Srećković, Teodora, et al.
Published: (2025)
Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes
by: Meyer, Sabri, et al.
Published: (2025)
by: Meyer, Sabri, et al.
Published: (2025)
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
by: Kumar, Navish, et al.
Published: (2025)
by: Kumar, Navish, et al.
Published: (2025)
Small Noise Perturbations in Multidimensional Case
by: Pilipenko, Andrey, et al.
Published: (2021)
by: Pilipenko, Andrey, et al.
Published: (2021)
Multi-layer random features and the approximation power of neural networks
by: Takhanov, Rustem
Published: (2024)
by: Takhanov, Rustem
Published: (2024)
Similar Items
-
Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
by: Compagnoni, Enea Monzio, et al.
Published: (2025) -
On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025) -
SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024) -
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026) -
Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)