Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Zhibang, Xu, Chaonong, Lv, Zhenjie, Liu, Zhizhuo, Zhao, Suyu
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2501.04489
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916556814942208
author	Liu, Zhibang Xu, Chaonong Lv, Zhenjie Liu, Zhizhuo Zhao, Suyu
author_facet	Liu, Zhibang Xu, Chaonong Lv, Zhenjie Liu, Zhizhuo Zhao, Suyu
contents	The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_04489
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning Liu, Zhibang Xu, Chaonong Lv, Zhenjie Liu, Zhizhuo Zhao, Suyu Distributed, Parallel, and Cluster Computing The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.
title	Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2501.04489

Similar Items