Saved in:
Bibliographic Details
Main Authors: Fabjančič, Matevž, Machidon, Octavian, Sharif, Hashim, Zhao, Yifan, Misailović, Saša, Pejović, Veljko
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2303.11291
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929251859562496
author Fabjančič, Matevž
Machidon, Octavian
Sharif, Hashim
Zhao, Yifan
Misailović, Saša
Pejović, Veljko
author_facet Fabjančič, Matevž
Machidon, Octavian
Sharif, Hashim
Zhao, Yifan
Misailović, Saša
Pejović, Veljko
contents Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this paper we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2303_11291
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Mobiprox: Supporting Dynamic Approximate Computing on Mobiles
Fabjančič, Matevž
Machidon, Octavian
Sharif, Hashim
Zhao, Yifan
Misailović, Saša
Pejović, Veljko
Machine Learning
Systems and Control
Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs. The existing compression techniques significantly reduce the memory, processing, and energy tax of DL, yet, the resulting models tend to be permanently impaired, sacrificing the inference power for reduced resource usage. The existing tunable compression approaches, on the other hand, require expensive re-training, do not support arbitrary strategies for adapting the compression and do not provide mobile-ready implementations. In this paper we present Mobiprox, a framework enabling mobile DL with flexible precision. Mobiprox implements tunable approximations of tensor operations and enables runtime-adaptable approximation of individual network layers. A profiler and a tuner included with Mobiprox identify the most promising neural network approximation configurations leading to the desired inference quality with the minimal use of resources. Furthermore, we develop control strategies that depending on contextual factors, such as the input data difficulty, dynamically adjust the approximation levels across a mobile DL model's layers. We implement Mobiprox in Android OS and through experiments in diverse mobile domains, including human activity recognition and spoken keyword detection, demonstrate that it can save up to 15% system-wide energy with a minimal impact on the inference accuracy.
title Mobiprox: Supporting Dynamic Approximate Computing on Mobiles
topic Machine Learning
Systems and Control
url https://arxiv.org/abs/2303.11291