Saved in:
Bibliographic Details
Main Authors: Lee, Chankyu, Choi, Woohyun, Park, Sangwook
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.01698
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911353551192064
author Lee, Chankyu
Choi, Woohyun
Park, Sangwook
author_facet Lee, Chankyu
Choi, Woohyun
Park, Sangwook
contents This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
format Preprint
id arxiv_https___arxiv_org_abs_2601_01698
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Hidden costs for inference with deep network on embedded system devices
Lee, Chankyu
Choi, Woohyun
Park, Sangwook
Computational Complexity
Machine Learning
This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
title Hidden costs for inference with deep network on embedded system devices
topic Computational Complexity
Machine Learning
url https://arxiv.org/abs/2601.01698