Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Struski, Łukasz, Morkisz, Paweł, Spurek, Przemysław, Bernabeu, Samuel Rodriguez, Trzciński, Tomasz
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Performance
Online Access:	https://arxiv.org/abs/2110.03423
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909134181367808
author	Struski, Łukasz Morkisz, Paweł Spurek, Przemysław Bernabeu, Samuel Rodriguez Trzciński, Tomasz
author_facet	Struski, Łukasz Morkisz, Paweł Spurek, Przemysław Bernabeu, Samuel Rodriguez Trzciński, Tomasz
contents	Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).
format	Preprint
id	arxiv_https___arxiv_org_abs_2110_03423
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Efficient GPU implementation of randomized SVD and its applications Struski, Łukasz Morkisz, Paweł Spurek, Przemysław Bernabeu, Samuel Rodriguez Trzciński, Tomasz Machine Learning Performance Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).
title	Efficient GPU implementation of randomized SVD and its applications
topic	Machine Learning Performance
url	https://arxiv.org/abs/2110.03423

Similar Items