Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Thomas, Stephen
Formato:	Preprint
Publicado:	2025
Materias:	Numerical Analysis Machine Learning
Acceso en línea:	https://arxiv.org/abs/2505.07046
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866916732092809216
author	Thomas, Stephen
author_facet	Thomas, Stephen
contents	We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient Descent), a novel optimization approach that accelerates convergence for ill-conditioned problems by projecting stochastic gradients onto a low-dimensional Krylov subspace. Directly inspired by recent advances in s-step Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers \cite{dambra2025sstep}, our method extends these techniques to the stochastic optimization domain. Our approach combines three key innovations: (1) projection coefficients computed via a single streaming Gauss-Seidel iteration, which is mathematically equivalent to Modified Gram-Schmidt orthogonalization; (2) a Chebyshev polynomial basis for constructing the Krylov subspace, providing superior numerical stability; and (3) efficient implementation for AMD GPUs using HIP. We prove that our streaming approach achieves a backward error near machine precision with $O(s^2)$ complexity rather than $O(s^3)$, where $s$ is the Krylov subspace dimension. Experimental results demonstrate that SKA-SGD significantly outperforms standard SGD and Adam in convergence rate and final error, particularly for problems with condition numbers exceeding $10^3$. GPU performance analysis reveals a crossover point where communication-avoiding benefits outweigh computational overhead, typically occurring at moderate scale ($p \approx 64$ processors) for problem sizes $n \geq 10^6$.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_07046
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Streaming Krylov-Accelerated Stochastic Gradient Descent Thomas, Stephen Numerical Analysis Machine Learning We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient Descent), a novel optimization approach that accelerates convergence for ill-conditioned problems by projecting stochastic gradients onto a low-dimensional Krylov subspace. Directly inspired by recent advances in s-step Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers \cite{dambra2025sstep}, our method extends these techniques to the stochastic optimization domain. Our approach combines three key innovations: (1) projection coefficients computed via a single streaming Gauss-Seidel iteration, which is mathematically equivalent to Modified Gram-Schmidt orthogonalization; (2) a Chebyshev polynomial basis for constructing the Krylov subspace, providing superior numerical stability; and (3) efficient implementation for AMD GPUs using HIP. We prove that our streaming approach achieves a backward error near machine precision with $O(s^2)$ complexity rather than $O(s^3)$, where $s$ is the Krylov subspace dimension. Experimental results demonstrate that SKA-SGD significantly outperforms standard SGD and Adam in convergence rate and final error, particularly for problems with condition numbers exceeding $10^3$. GPU performance analysis reveals a crossover point where communication-avoiding benefits outweigh computational overhead, typically occurring at moderate scale ($p \approx 64$ processors) for problem sizes $n \geq 10^6$.
title	Streaming Krylov-Accelerated Stochastic Gradient Descent
topic	Numerical Analysis Machine Learning
url	https://arxiv.org/abs/2505.07046

Ejemplares similares