Saved in:
Bibliographic Details
Main Authors: Compagnoni, Enea Monzio, Islamov, Rustem, Proske, Frank Norbert, Lucchi, Aurelien
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.17009
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929735198572544
author Compagnoni, Enea Monzio
Islamov, Rustem
Proske, Frank Norbert
Lucchi, Aurelien
author_facet Compagnoni, Enea Monzio
Islamov, Rustem
Proske, Frank Norbert
Lucchi, Aurelien
contents Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.
format Preprint
id arxiv_https___arxiv_org_abs_2502_17009
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
Compagnoni, Enea Monzio
Islamov, Rustem
Proske, Frank Norbert
Lucchi, Aurelien
Machine Learning
Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.
title Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
topic Machine Learning
url https://arxiv.org/abs/2502.17009