Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Cheng, Ziheng, Glasgow, Margalit
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning Optimization and Control
Acceso en línea:	https://arxiv.org/abs/2409.13155
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912230213156864
author	Cheng, Ziheng Glasgow, Margalit
author_facet	Cheng, Ziheng Glasgow, Margalit
contents	We study distributed adaptive algorithms with local updates (intermittent communication). Despite the great empirical success of adaptive methods in distributed training of modern machine learning models, the theoretical benefits of local updates within adaptive methods, particularly in terms of reducing communication complexity, have not been fully understood yet. In this paper, for the first time, we prove that \em Local SGD \em with momentum (\em Local \em SGDM) and \em Local \em Adam can outperform their minibatch counterparts in convex and weakly convex settings in certain regimes, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial yet challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping strategy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_13155
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Convergence of Distributed Adaptive Optimization with Local Updates Cheng, Ziheng Glasgow, Margalit Machine Learning Optimization and Control We study distributed adaptive algorithms with local updates (intermittent communication). Despite the great empirical success of adaptive methods in distributed training of modern machine learning models, the theoretical benefits of local updates within adaptive methods, particularly in terms of reducing communication complexity, have not been fully understood yet. In this paper, for the first time, we prove that \em Local SGD \em with momentum (\em Local \em SGDM) and \em Local \em Adam can outperform their minibatch counterparts in convex and weakly convex settings in certain regimes, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial yet challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping strategy.
title	Convergence of Distributed Adaptive Optimization with Local Updates
topic	Machine Learning Optimization and Control
url	https://arxiv.org/abs/2409.13155

Ejemplares similares