:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Shuhua, Zhou, Ding, Xie, Cong, Xu, An, Zhang, Zhi, Liu, Xin, Kar, Soummya
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2411.17866
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
by: Armacki, Aleksandar, et al.
Published: (2024)

Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
by: Armacki, Aleksandar, et al.
Published: (2024)

A Unified Framework for Center-based Clustering of Distributed Data
by: Armacki, Aleksandar, et al.
Published: (2024)

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout
by: Fiscko, Carmel, et al.
Published: (2023)

Distributed gradient methods under heavy-tailed communication noise
by: Vukovic, Manojlo, et al.
Published: (2025)

Distributed Gradient Clustering: Convergence and the Effect of Initialization
by: Armacki, Aleksandar, et al.
Published: (2026)

Smoothed Gradient Clipping and Error Feedback for Decentralized Optimization under Symmetric Heavy-Tailed Noise
by: Yu, Shuhua, et al.
Published: (2023)

Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
by: Yu, Shuhua, et al.
Published: (2025)

Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization
by: Armacki, Aleksandar, et al.
Published: (2025)

AdaPM: a Partial Momentum Algorithm for LLM Training
by: Zhang, Yimu, et al.
Published: (2025)

Federated Multi-Objective Learning with Controlled Pareto Frontiers
by: Rao, Jiansheng, et al.
Published: (2025)

Tight Long-Term Tail Decay of (Clipped) SGD in Non-Convex Optimization
by: Armacki, Aleksandar, et al.
Published: (2026)

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
by: Deng, Shenyang, et al.
Published: (2026)

Improved Analysis for Sign-based Methods with Momentum Updates
by: Jiang, Wei, et al.
Published: (2025)

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
by: Armacki, Aleksandar, et al.
Published: (2023)

Enhanced Momentum with Momentum Transformers
by: Mason, Max, et al.
Published: (2024)

Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
by: Crawshaw, Michael, et al.
Published: (2025)

Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training
by: Wang, Zhifeng, et al.
Published: (2025)

Ordered Local Momentum for Asynchronous Distributed Learning under Arbitrary Delays
by: Shi, Chang-Wei, et al.
Published: (2026)

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
by: Zhao, Shen-Yi, et al.
Published: (2020)

Global Momentum Compression for Sparse Communication in Distributed Learning
by: Shi, Chang-Wei, et al.
Published: (2019)

StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)

On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
by: Shen, Wei, et al.
Published: (2024)

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
by: Yu, Xin, et al.
Published: (2025)

Decentralized Local Voltage Control for Active Distribution Networks
by: Fernandes, Diana Vieira, et al.
Published: (2025)

Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
by: Yang, Hong, et al.
Published: (2026)

Distributed Truncated Predictive Control for Networked Systems under Uncertainty: Stability and Near-Optimality Guarantee
by: Xu, Eric, et al.
Published: (2023)

Distributed Low-Communication Training with Decoupled Momentum Optimization
by: Nedelkoski, Sasho, et al.
Published: (2025)

Self-Explainable Graph Transformer for Link Sign Prediction
by: Li, Lu, et al.
Published: (2024)

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning
by: Yan, Peishen, et al.
Published: (2026)

High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
by: Jagannath, Aukosh, et al.
Published: (2025)

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)

Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
by: Petrov, Egor, et al.
Published: (2025)

Enhancing Signed Graph Neural Networks through Curriculum-Based Training
by: Zhang, Zeyu, et al.
Published: (2023)

CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations
by: Xu, Rui, et al.
Published: (2025)

Training-Free ANN-to-SNN Conversion for High-Performance Spiking Transformer
by: Wang, Jingya, et al.
Published: (2025)

Embracing Unknown Step by Step: Towards Reliable Sparse Training in Real World
by: Lei, Bowen, et al.
Published: (2024)

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
by: Shen, Wei, et al.
Published: (2025)