Saved in:
Bibliographic Details
Main Authors: Deng, Xiaoge, Shen, Li, Li, Shengwei, Sun, Tao, Li, Dongsheng, Tao, Dacheng
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2308.09430
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916757792358400
author Deng, Xiaoge
Shen, Li
Li, Shengwei
Sun, Tao
Li, Dongsheng
Tao, Dacheng
author_facet Deng, Xiaoge
Shen, Li
Li, Shengwei
Sun, Tao
Li, Dongsheng
Tao, Dacheng
contents Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $τ$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}}(\frac{T-τ}{nτ})$ and $\tilde{\mathcal{O}}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
format Preprint
id arxiv_https___arxiv_org_abs_2308_09430
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
Deng, Xiaoge
Shen, Li
Li, Shengwei
Sun, Tao
Li, Dongsheng
Tao, Dacheng
Machine Learning
Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $τ$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}}(\frac{T-τ}{nτ})$ and $\tilde{\mathcal{O}}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
title Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent
topic Machine Learning
url https://arxiv.org/abs/2308.09430