Saved in:
Bibliographic Details
Main Authors: Struski, Łukasz, Morkisz, Paweł, Spurek, Przemysław, Bernabeu, Samuel Rodriguez, Trzciński, Tomasz
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2110.03423
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909134181367808
author Struski, Łukasz
Morkisz, Paweł
Spurek, Przemysław
Bernabeu, Samuel Rodriguez
Trzciński, Tomasz
author_facet Struski, Łukasz
Morkisz, Paweł
Spurek, Przemysław
Bernabeu, Samuel Rodriguez
Trzciński, Tomasz
contents Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).
format Preprint
id arxiv_https___arxiv_org_abs_2110_03423
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle Efficient GPU implementation of randomized SVD and its applications
Struski, Łukasz
Morkisz, Paweł
Spurek, Przemysław
Bernabeu, Samuel Rodriguez
Trzciński, Tomasz
Machine Learning
Performance
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).
title Efficient GPU implementation of randomized SVD and its applications
topic Machine Learning
Performance
url https://arxiv.org/abs/2110.03423