:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Compagnoni, Enea Monzio, Islamov, Rustem, Proske, Frank Norbert, Lucchi, Aurelien
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.17009
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
by: Compagnoni, Enea Monzio, et al.
Published: (2024)

On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025)

SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024)

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026)

Why Do We Need Warm-up? A Theoretical Perspective
by: Alimisis, Foivos, et al.
Published: (2025)

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)

Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
by: Islamov, Rustem, et al.
Published: (2025)

Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)

On the Role of Batch Size in Stochastic Conditional Gradient Methods
by: Islamov, Rustem, et al.
Published: (2026)

Towards Faster Decentralized Stochastic Optimization with Communication Compression
by: Islamov, Rustem, et al.
Published: (2024)

Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
by: Islamov, Rustem, et al.
Published: (2025)

Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)

A Theoretical Analysis of the Learning Dynamics under Class Imbalance
by: Francazi, Emanuele, et al.
Published: (2022)

Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization
by: Masarczyk, Wojciech, et al.
Published: (2025)

A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression
by: Cheng, Tin Sum, et al.
Published: (2024)

On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)

Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)

Cubic regularized subspace Newton for non-convex optimization
by: Zhao, Jim, et al.
Published: (2024)

Initial Guessing Bias: How Untrained Networks Favor Some Classes
by: Francazi, Emanuele, et al.
Published: (2023)

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
by: Cheng, Tin Sum, et al.
Published: (2024)

Optimizer choice matters for the emergence of Neural Collapse
by: Zhao, Jim, et al.
Published: (2026)

The informativeness of the gradient revisited
by: Takhanov, Rustem
Published: (2025)

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes
by: Meyer, Sabri, et al.
Published: (2025)

Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
by: Kumar, Navish, et al.
Published: (2025)

Small Noise Perturbations in Multidimensional Case
by: Pilipenko, Andrey, et al.
Published: (2021)

Multi-layer random features and the approximation power of neural networks
by: Takhanov, Rustem
Published: (2024)

Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models
by: Zhang, Fengzhe, et al.
Published: (2024)

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing
by: Yang, Xuwei, et al.
Published: (2023)

Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?
by: He, Yutong, et al.
Published: (2023)

Generator Identification for Linear SDEs with Additive and Multiplicative Noise
by: Wang, Yuanyuan, et al.
Published: (2023)

Where You Place the Norm Matters: From Prejudiced to Neutral Initializations
by: Francazi, Emanuele, et al.
Published: (2025)

When Bias Meets Trainability: Connecting Theories of Initialization
by: Bassi, Alberto, et al.
Published: (2025)

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)

Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)

Learning Unbiased Permutations via Flow Matching
by: Min, Yimeng, et al.
Published: (2026)

Reward Augmentation in Reinforcement Learning for Testing Distributed Systems
by: Borgarelli, Andrea, et al.
Published: (2024)

A Malliavin calculus approach to score functions in diffusion generative models
by: Mirafzali, Ehsan, et al.
Published: (2025)

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
by: Anagnostidis, Sotiris, et al.
Published: (2023)

Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models
by: Zhang, Fengzhe, et al.
Published: (2025)