Saved in:
| Main Authors: | Yu, Shuhua, Zhou, Ding, Xie, Cong, Xu, An, Zhang, Zhi, Liu, Xin, Kar, Soummya |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.17866 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
by: Armacki, Aleksandar, et al.
Published: (2024)
by: Armacki, Aleksandar, et al.
Published: (2024)
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
by: Armacki, Aleksandar, et al.
Published: (2024)
by: Armacki, Aleksandar, et al.
Published: (2024)
A Unified Framework for Center-based Clustering of Distributed Data
by: Armacki, Aleksandar, et al.
Published: (2024)
by: Armacki, Aleksandar, et al.
Published: (2024)
Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout
by: Fiscko, Carmel, et al.
Published: (2023)
by: Fiscko, Carmel, et al.
Published: (2023)
Distributed gradient methods under heavy-tailed communication noise
by: Vukovic, Manojlo, et al.
Published: (2025)
by: Vukovic, Manojlo, et al.
Published: (2025)
Distributed Gradient Clustering: Convergence and the Effect of Initialization
by: Armacki, Aleksandar, et al.
Published: (2026)
by: Armacki, Aleksandar, et al.
Published: (2026)
Smoothed Gradient Clipping and Error Feedback for Decentralized Optimization under Symmetric Heavy-Tailed Noise
by: Yu, Shuhua, et al.
Published: (2023)
by: Yu, Shuhua, et al.
Published: (2023)
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
by: Yu, Shuhua, et al.
Published: (2025)
by: Yu, Shuhua, et al.
Published: (2025)
Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization
by: Armacki, Aleksandar, et al.
Published: (2025)
by: Armacki, Aleksandar, et al.
Published: (2025)
AdaPM: a Partial Momentum Algorithm for LLM Training
by: Zhang, Yimu, et al.
Published: (2025)
by: Zhang, Yimu, et al.
Published: (2025)
Federated Multi-Objective Learning with Controlled Pareto Frontiers
by: Rao, Jiansheng, et al.
Published: (2025)
by: Rao, Jiansheng, et al.
Published: (2025)
Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
by: Armacki, Aleksandar, et al.
Published: (2026)
by: Armacki, Aleksandar, et al.
Published: (2026)
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
by: Deng, Shenyang, et al.
Published: (2026)
by: Deng, Shenyang, et al.
Published: (2026)
Improved Analysis for Sign-based Methods with Momentum Updates
by: Jiang, Wei, et al.
Published: (2025)
by: Jiang, Wei, et al.
Published: (2025)
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)
by: Xie, Yanyue, et al.
Published: (2024)
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
by: Armacki, Aleksandar, et al.
Published: (2023)
by: Armacki, Aleksandar, et al.
Published: (2023)
Enhanced Momentum with Momentum Transformers
by: Mason, Max, et al.
Published: (2024)
by: Mason, Max, et al.
Published: (2024)
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
by: Crawshaw, Michael, et al.
Published: (2025)
by: Crawshaw, Michael, et al.
Published: (2025)
Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
by: Wang, Zhifeng, et al.
Published: (2025)
by: Wang, Zhifeng, et al.
Published: (2025)
Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays
by: Shi, Chang-Wei, et al.
Published: (2026)
by: Shi, Chang-Wei, et al.
Published: (2026)
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
by: Zhao, Shen-Yi, et al.
Published: (2020)
by: Zhao, Shen-Yi, et al.
Published: (2020)
Global Momentum Compression for Sparse Communication in Distributed Learning
by: Shi, Chang-Wei, et al.
Published: (2019)
by: Shi, Chang-Wei, et al.
Published: (2019)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
by: Shen, Wei, et al.
Published: (2024)
by: Shen, Wei, et al.
Published: (2024)
PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
by: Yu, Xin, et al.
Published: (2025)
by: Yu, Xin, et al.
Published: (2025)
Decentralized Local Voltage Control for Active Distribution Networks
by: Fernandes, Diana Vieira, et al.
Published: (2025)
by: Fernandes, Diana Vieira, et al.
Published: (2025)
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
by: Yang, Hong, et al.
Published: (2026)
by: Yang, Hong, et al.
Published: (2026)
Distributed Truncated Predictive Control for Networked Systems under Uncertainty: Stability and Near-Optimality Guarantee
by: Xu, Eric, et al.
Published: (2023)
by: Xu, Eric, et al.
Published: (2023)
Distributed Low-Communication Training with Decoupled Momentum Optimization
by: Nedelkoski, Sasho, et al.
Published: (2025)
by: Nedelkoski, Sasho, et al.
Published: (2025)
Self-Explainable Graph Transformer for Link Sign Prediction
by: Li, Lu, et al.
Published: (2024)
by: Li, Lu, et al.
Published: (2024)
FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning
by: Yan, Peishen, et al.
Published: (2026)
by: Yan, Peishen, et al.
Published: (2026)
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
by: Jagannath, Aukosh, et al.
Published: (2025)
by: Jagannath, Aukosh, et al.
Published: (2025)
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
by: Petrov, Egor, et al.
Published: (2025)
by: Petrov, Egor, et al.
Published: (2025)
Enhancing Signed Graph Neural Networks through Curriculum-Based Training
by: Zhang, Zeyu, et al.
Published: (2023)
by: Zhang, Zeyu, et al.
Published: (2023)
CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
Training-Free ANN-to-SNN Conversion for High-Performance Spiking Transformer
by: Wang, Jingya, et al.
Published: (2025)
by: Wang, Jingya, et al.
Published: (2025)
Embracing Unknown Step by Step: Towards Reliable Sparse Training in Real World
by: Lei, Bowen, et al.
Published: (2024)
by: Lei, Bowen, et al.
Published: (2024)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)
by: Li, Bingrui, et al.
Published: (2024)
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Similar Items
-
Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
by: Armacki, Aleksandar, et al.
Published: (2024) -
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
by: Armacki, Aleksandar, et al.
Published: (2024) -
A Unified Framework for Center-based Clustering of Distributed Data
by: Armacki, Aleksandar, et al.
Published: (2024) -
Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout
by: Fiscko, Carmel, et al.
Published: (2023) -
Distributed gradient methods under heavy-tailed communication noise
by: Vukovic, Manojlo, et al.
Published: (2025)