Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fabjančič, Matevž, Machidon, Octavian, Sharif, Hashim, Zhao, Yifan, Misailović, Saša, Pejović, Veljko
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Systems and Control
Online Access:	https://arxiv.org/abs/2303.11291
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929251859562496
author	Fabjančič, Matevž Machidon, Octavian Sharif, Hashim Zhao, Yifan Misailović, Saša Pejović, Veljko
author_facet	Fabjančič, Matevž Machidon, Octavian Sharif, Hashim Zhao, Yifan Misailović, Saša Pejović, Veljko
contents	Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this paper we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2303_11291
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Mobiprox: Supporting Dynamic Approximate Computing on Mobiles Fabjančič, Matevž Machidon, Octavian Sharif, Hashim Zhao, Yifan Misailović, Saša Pejović, Veljko Machine Learning Systems and Control Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this paper we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
title	Mobiprox: Supporting Dynamic Approximate Computing on Mobiles
topic	Machine Learning Systems and Control
url	https://arxiv.org/abs/2303.11291

Similar Items