Saved in:
| Main Authors: | Compagnoni, Enea Monzio, Islamov, Rustem, Proske, Frank Norbert, Lucchi, Aurelien |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.17009 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
by: Compagnoni, Enea Monzio, et al.
Published: (2025)
SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
by: Compagnoni, Enea Monzio, et al.
Published: (2024)
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026)
by: Compagnoni, Enea Monzio, et al.
Published: (2026)
Why Do We Need Warm-up? A Theoretical Perspective
by: Alimisis, Foivos, et al.
Published: (2025)
by: Alimisis, Foivos, et al.
Published: (2025)
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
On the Role of Batch Size in Stochastic Conditional Gradient Methods
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Towards Faster Decentralized Stochastic Optimization with Communication Compression
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
Safe-EF: Error Feedback for Nonsmooth Constrained Optimization
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Non-Euclidean Gradient Descent Operates at the Edge of Stability
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
A Theoretical Analysis of the Learning Dynamics under Class Imbalance
by: Francazi, Emanuele, et al.
Published: (2022)
by: Francazi, Emanuele, et al.
Published: (2022)
Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization
by: Masarczyk, Wojciech, et al.
Published: (2025)
by: Masarczyk, Wojciech, et al.
Published: (2025)
A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression
by: Cheng, Tin Sum, et al.
Published: (2024)
by: Cheng, Tin Sum, et al.
Published: (2024)
On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)
by: Takhanov, Rustem
Published: (2026)
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)
by: Zhao, Jim, et al.
Published: (2024)
Cubic regularized subspace Newton for non-convex optimization
by: Zhao, Jim, et al.
Published: (2024)
by: Zhao, Jim, et al.
Published: (2024)
Initial Guessing Bias: How Untrained Networks Favor Some Classes
by: Francazi, Emanuele, et al.
Published: (2023)
by: Francazi, Emanuele, et al.
Published: (2023)
Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
by: Cheng, Tin Sum, et al.
Published: (2024)
by: Cheng, Tin Sum, et al.
Published: (2024)
Optimizer choice matters for the emergence of Neural Collapse
by: Zhao, Jim, et al.
Published: (2026)
by: Zhao, Jim, et al.
Published: (2026)
The informativeness of the gradient revisited
by: Takhanov, Rustem
Published: (2025)
by: Takhanov, Rustem
Published: (2025)
Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes
by: Meyer, Sabri, et al.
Published: (2025)
by: Meyer, Sabri, et al.
Published: (2025)
Optimization Guarantees for Square-Root Natural-Gradient Variational Inference
by: Kumar, Navish, et al.
Published: (2025)
by: Kumar, Navish, et al.
Published: (2025)
Small Noise Perturbations in Multidimensional Case
by: Pilipenko, Andrey, et al.
Published: (2021)
by: Pilipenko, Andrey, et al.
Published: (2021)
Multi-layer random features and the approximation power of neural networks
by: Takhanov, Rustem
Published: (2024)
by: Takhanov, Rustem
Published: (2024)
Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models
by: Zhang, Fengzhe, et al.
Published: (2024)
by: Zhang, Fengzhe, et al.
Published: (2024)
Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing
by: Yang, Xuwei, et al.
Published: (2023)
by: Yang, Xuwei, et al.
Published: (2023)
Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?
by: He, Yutong, et al.
Published: (2023)
by: He, Yutong, et al.
Published: (2023)
Generator Identification for Linear SDEs with Additive and Multiplicative Noise
by: Wang, Yuanyuan, et al.
Published: (2023)
by: Wang, Yuanyuan, et al.
Published: (2023)
Where You Place the Norm Matters: From Prejudiced to Neutral Initializations
by: Francazi, Emanuele, et al.
Published: (2025)
by: Francazi, Emanuele, et al.
Published: (2025)
When Bias Meets Trainability: Connecting Theories of Initialization
by: Bassi, Alberto, et al.
Published: (2025)
by: Bassi, Alberto, et al.
Published: (2025)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)
by: Mansouri, Omar El, et al.
Published: (2025)
Learning Unbiased Permutations via Flow Matching
by: Min, Yimeng, et al.
Published: (2026)
by: Min, Yimeng, et al.
Published: (2026)
Reward Augmentation in Reinforcement Learning for Testing Distributed Systems
by: Borgarelli, Andrea, et al.
Published: (2024)
by: Borgarelli, Andrea, et al.
Published: (2024)
A Malliavin calculus approach to score functions in diffusion generative models
by: Mirafzali, Ehsan, et al.
Published: (2025)
by: Mirafzali, Ehsan, et al.
Published: (2025)
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
by: Anagnostidis, Sotiris, et al.
Published: (2023)
by: Anagnostidis, Sotiris, et al.
Published: (2023)
Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models
by: Zhang, Fengzhe, et al.
Published: (2025)
by: Zhang, Fengzhe, et al.
Published: (2025)
Similar Items
-
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
by: Compagnoni, Enea Monzio, et al.
Published: (2024) -
On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach
by: Compagnoni, Enea Monzio, et al.
Published: (2025) -
SDEs for Minimax Optimization
by: Compagnoni, Enea Monzio, et al.
Published: (2024) -
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
by: Compagnoni, Enea Monzio, et al.
Published: (2026) -
Why Do We Need Warm-up? A Theoretical Perspective
by: Alimisis, Foivos, et al.
Published: (2025)