Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Deng, Xiaoge, Shen, Li, Li, Shengwei, Sun, Tao, Li, Dongsheng, Tao, Dacheng
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2308.09430
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916757792358400
author	Deng, Xiaoge Shen, Li Li, Shengwei Sun, Tao Li, Dongsheng Tao, Dacheng
author_facet	Deng, Xiaoge Shen, Li Li, Shengwei Sun, Tao Li, Dongsheng Tao, Dacheng
contents	Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $τ$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}}(\frac{T-τ}{nτ})$ and $\tilde{\mathcal{O}}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2308_09430
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent Deng, Xiaoge Shen, Li Li, Shengwei Sun, Tao Li, Dongsheng Tao, Dacheng Machine Learning Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $τ$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}}(\frac{T-τ}{nτ})$ and $\tilde{\mathcal{O}}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
title	Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
topic	Machine Learning
url	https://arxiv.org/abs/2308.09430

Similar Items