Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tan, Shuo, Liu, Rui, Han, Xuesong, Long, XianLei, Wan, Kai, Song, Linqi, Li, Yong
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence Computer Vision and Pattern Recognition Information Theory Machine Learning
Online Access:	https://arxiv.org/abs/2411.01579
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918136516706304
author	Tan, Shuo Liu, Rui Han, Xuesong Long, XianLei Wan, Kai Song, Linqi Li, Yong
author_facet	Tan, Shuo Liu, Rui Han, Xuesong Long, XianLei Wan, Kai Song, Linqi Li, Yong
contents	Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC subtasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, straggler resilience, and scalability across various CNN architectures.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_01579
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs Tan, Shuo Liu, Rui Han, Xuesong Long, XianLei Wan, Kai Song, Linqi Li, Yong Distributed, Parallel, and Cluster Computing Artificial Intelligence Computer Vision and Pattern Recognition Information Theory Machine Learning Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed environments susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance straggler resilience and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as the Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for the input tensor and Kernel-Channel Coded Partitioning (KCCP) for the filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC subtasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, straggler resilience, and scalability across various CNN architectures.
title	Flexible Coded Distributed Convolution Computing for Enhanced Straggler Resilience and Numerical Stability in Distributed CNNs
topic	Distributed, Parallel, and Cluster Computing Artificial Intelligence Computer Vision and Pattern Recognition Information Theory Machine Learning
url	https://arxiv.org/abs/2411.01579

Similar Items