Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Soto, Pedro
Format:	Preprint
Published:	2022
Subjects:	Information Theory Distributed, Parallel, and Cluster Computing Machine Learning Numerical Analysis Symbolic Computation E.4; H.1.1; C.2.4; B.8.1; C.4; G.1.3; I.2.6; I.1.2
Online Access:	https://arxiv.org/abs/2202.03469
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912090238746624
author	Soto, Pedro
author_facet	Soto, Pedro
contents	Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.
format	Preprint
id	arxiv_https___arxiv_org_abs_2202_03469
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors Soto, Pedro Information Theory Distributed, Parallel, and Cluster Computing Machine Learning Numerical Analysis Symbolic Computation E.4; H.1.1; C.2.4; B.8.1; C.4; G.1.3; I.2.6; I.1.2 Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.
title	Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors
topic	Information Theory Distributed, Parallel, and Cluster Computing Machine Learning Numerical Analysis Symbolic Computation E.4; H.1.1; C.2.4; B.8.1; C.4; G.1.3; I.2.6; I.1.2
url	https://arxiv.org/abs/2202.03469

Similar Items