:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Compagnoni, Enea Monzio, Liu, Tianlin, Islamov, Rustem, Proske, Frank Norbert, Orvieto, Antonio, Lucchi, Aurelien
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2411.15958
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
by: Compagnoni, Enea Monzio, et al.
Published: (2025)

On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025)

SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024)

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026)

Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)

Why Do We Need Warm-up? A Theoretical Perspective
by: Alimisis, Foivos, et al.
Published: (2025)

On the Role of Batch Size in Stochastic Conditional Gradient Methods
by: Islamov, Rustem, et al.
Published: (2026)

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
by: Islamov, Rustem, et al.
Published: (2025)

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)

Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
by: Islamov, Rustem, et al.
Published: (2025)

Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)

Towards Faster Decentralized Stochastic Optimization with Communication Compression
by: Islamov, Rustem, et al.
Published: (2024)

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)

Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)

Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)

A Theoretical Analysis of the Learning Dynamics under Class Imbalance
by: Francazi, Emanuele, et al.
Published: (2022)

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)

Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)

Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)

Theoretical Foundations of Deep Selective State-Space Models
by: Cirone, Nicola Muca, et al.
Published: (2024)

In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)

Cubic regularized subspace Newton for non-convex optimization
by: Zhao, Jim, et al.
Published: (2024)

Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)

An Uncertainty Principle for Linear Recurrent Neural Networks
by: François, Alexandre, et al.
Published: (2025)

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)

Improved state mixing in higher-order and block diagonal linear recurrent networks
by: Dubinin, Igor, et al.
Published: (2026)

When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)

Initial Guessing Bias: How Untrained Networks Favor Some Classes
by: Francazi, Emanuele, et al.
Published: (2023)

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
by: Cheng, Tin Sum, et al.
Published: (2024)

Optimizer choice matters for the emergence of Neural Collapse
by: Zhao, Jim, et al.
Published: (2026)

A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression
by: Cheng, Tin Sum, et al.
Published: (2024)

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding
by: Takhanov, Rustem, et al.
Published: (2026)

On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)

The informativeness of the gradient revisited
by: Takhanov, Rustem
Published: (2025)

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes
by: Meyer, Sabri, et al.
Published: (2025)

Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
by: Kumar, Navish, et al.
Published: (2025)

Small Noise Perturbations in Multidimensional Case
by: Pilipenko, Andrey, et al.
Published: (2021)

Multi-layer random features and the approximation power of neural networks
by: Takhanov, Rustem
Published: (2024)