Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shao, Jiawei, Zhang, Jun
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Signal Processing
Online Access:	https://arxiv.org/abs/2006.02166
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929564961210368
author	Shao, Jiawei Zhang, Jun
author_facet	Shao, Jiawei Zhang, Jun
contents	The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2006_02166
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Communication-Computation Trade-Off in Resource-Constrained Edge Inference Shao, Jiawei Zhang, Jun Machine Learning Signal Processing The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.
title	Communication-Computation Trade-Off in Resource-Constrained Edge Inference
topic	Machine Learning Signal Processing
url	https://arxiv.org/abs/2006.02166

Similar Items