Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Compagnoni, Enea Monzio, Islamov, Rustem, Proske, Frank Norbert, Lucchi, Aurelien
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.17009
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929735198572544
author	Compagnoni, Enea Monzio Islamov, Rustem Proske, Frank Norbert Lucchi, Aurelien
author_facet	Compagnoni, Enea Monzio Islamov, Rustem Proske, Frank Norbert Lucchi, Aurelien
contents	Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_17009
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs Compagnoni, Enea Monzio Islamov, Rustem Proske, Frank Norbert Lucchi, Aurelien Machine Learning Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and heavy-tailed noise. Additionally, we propose new scaling rules for hyperparameter tuning to mitigate performance degradation due to compression. These findings are empirically validated across multiple deep learning architectures and datasets, providing practical recommendations for distributed optimization.
title	Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
topic	Machine Learning
url	https://arxiv.org/abs/2502.17009

Similar Items