Saved in:
| Main Authors: | Gruntkowska, Kaja, Gaponov, Alexander, Tovmasyan, Zhirayr, Richtárik, Peter |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.00643 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Drop-Muon: Update Less, Converge Faster
by: Gruntkowska, Kaja, et al.
Published: (2025)
by: Gruntkowska, Kaja, et al.
Published: (2025)
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
by: Gruntkowska, Kaja, et al.
Published: (2025)
by: Gruntkowska, Kaja, et al.
Published: (2025)
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
by: Riabinin, Artem, et al.
Published: (2025)
by: Riabinin, Artem, et al.
Published: (2025)
Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity
by: Gruntkowska, Kaja, et al.
Published: (2024)
by: Gruntkowska, Kaja, et al.
Published: (2024)
Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations
by: Tyurin, Alexander, et al.
Published: (2024)
by: Tyurin, Alexander, et al.
Published: (2024)
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
by: Tovmasyan, Zhirayr, et al.
Published: (2026)
by: Tovmasyan, Zhirayr, et al.
Published: (2026)
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
by: Richtárik, Peter, et al.
Published: (2026)
by: Richtárik, Peter, et al.
Published: (2026)
Tighter Performance Theory of FedExProx
by: Anyszka, Wojciech, et al.
Published: (2024)
by: Anyszka, Wojciech, et al.
Published: (2024)
The Ball-Proximal (="Broximal") Point Method: a New Algorithm, Convergence Theory, and Applications
by: Gruntkowska, Kaja, et al.
Published: (2025)
by: Gruntkowska, Kaja, et al.
Published: (2025)
Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity
by: Tovmasyan, Zhirayr, et al.
Published: (2025)
by: Tovmasyan, Zhirayr, et al.
Published: (2025)
Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
by: Rammal, Ahmad, et al.
Published: (2023)
by: Rammal, Ahmad, et al.
Published: (2023)
Stabilized Proximal Point Method via Trust Region Control
by: Li, Hanmin, et al.
Published: (2026)
by: Li, Hanmin, et al.
Published: (2026)
Broximal Alignment for Global Non-Convex Optimization
by: Gruntkowska, Kaja, et al.
Published: (2026)
by: Gruntkowska, Kaja, et al.
Published: (2026)
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions
by: Islamov, Rustem, et al.
Published: (2026)
by: Islamov, Rustem, et al.
Published: (2026)
Muon is Provably Faster with Momentum Variance Reduction
by: Qian, Xun, et al.
Published: (2025)
by: Qian, Xun, et al.
Published: (2025)
Beyond the Ideal: Analyzing the Inexact Muon Update
by: Shulgin, Egor, et al.
Published: (2025)
by: Shulgin, Egor, et al.
Published: (2025)
Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback
by: Fatkhullin, Ilyas, et al.
Published: (2021)
by: Fatkhullin, Ilyas, et al.
Published: (2021)
Improved Convergence in Parameter-Agnostic Error Feedback through Momentum
by: Sadiev, Abdurakhmon, et al.
Published: (2025)
by: Sadiev, Abdurakhmon, et al.
Published: (2025)
A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting
by: Tyurin, Alexander, et al.
Published: (2022)
by: Tyurin, Alexander, et al.
Published: (2022)
Convergence Analysis of the PAGE Stochastic Algorithm for Weakly Convex Finite-Sum Optimization
by: Condat, Laurent, et al.
Published: (2025)
by: Condat, Laurent, et al.
Published: (2025)
MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes
by: Sokolov, Igor, et al.
Published: (2024)
by: Sokolov, Igor, et al.
Published: (2024)
Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity
by: Tyurin, Alexander, et al.
Published: (2024)
by: Tyurin, Alexander, et al.
Published: (2024)
Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
by: Sadiev, Abdurakhmon, et al.
Published: (2025)
by: Sadiev, Abdurakhmon, et al.
Published: (2025)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness
by: Richtárik, Peter, et al.
Published: (2024)
by: Richtárik, Peter, et al.
Published: (2024)
BiCoLoR: Communication-Efficient Optimization with Bidirectional Compression and Local Training
by: Condat, Laurent, et al.
Published: (2026)
by: Condat, Laurent, et al.
Published: (2026)
On the Convergence of DP-SGD with Adaptive Clipping
by: Shulgin, Egor, et al.
Published: (2024)
by: Shulgin, Egor, et al.
Published: (2024)
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
by: Shulgin, Egor, et al.
Published: (2025)
by: Shulgin, Egor, et al.
Published: (2025)
Better LMO-based Momentum Methods with Second-Order Information
by: Khirirat, Sarit, et al.
Published: (2025)
by: Khirirat, Sarit, et al.
Published: (2025)
Sparse-ProxSkip: Accelerated Sparse-to-Sparse Training in Federated Learning
by: Meinhardt, Georg, et al.
Published: (2024)
by: Meinhardt, Georg, et al.
Published: (2024)
TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation
by: Condat, Laurent, et al.
Published: (2023)
by: Condat, Laurent, et al.
Published: (2023)
Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
by: Maranjyan, Artavazd, et al.
Published: (2025)
by: Maranjyan, Artavazd, et al.
Published: (2025)
A Novel Unified Parametric Assumption for Nonconvex Optimization
by: Riabinin, Artem, et al.
Published: (2025)
by: Riabinin, Artem, et al.
Published: (2025)
Differentially Private Random Block Coordinate Descent
by: Maranjyan, Artavazd, et al.
Published: (2024)
by: Maranjyan, Artavazd, et al.
Published: (2024)
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)
by: Paquette, Elliot, et al.
Published: (2026)
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
by: Maranjyan, Artavazd, et al.
Published: (2025)
by: Maranjyan, Artavazd, et al.
Published: (2025)
Towards a Better Theoretical Understanding of Independent Subnetwork Training
by: Shulgin, Egor, et al.
Published: (2023)
by: Shulgin, Egor, et al.
Published: (2023)
Modular Distributed Nonconvex Learning with Error Feedback
by: Carnevale, Guido, et al.
Published: (2025)
by: Carnevale, Guido, et al.
Published: (2025)
MuonBP: Faster Muon via Block-Periodic Orthogonalization
by: Khaled, Ahmed, et al.
Published: (2025)
by: Khaled, Ahmed, et al.
Published: (2025)
LiMuon: Light and Fast Muon Optimizer for Large Models
by: Huang, Feihu, et al.
Published: (2025)
by: Huang, Feihu, et al.
Published: (2025)
Similar Items
-
Drop-Muon: Update Less, Converge Faster
by: Gruntkowska, Kaja, et al.
Published: (2025) -
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
by: Gruntkowska, Kaja, et al.
Published: (2025) -
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
by: Riabinin, Artem, et al.
Published: (2025) -
Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity
by: Gruntkowska, Kaja, et al.
Published: (2024) -
Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations
by: Tyurin, Alexander, et al.
Published: (2024)