Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lee, Chankyu, Choi, Woohyun, Park, Sangwook
Format:	Preprint
Published:	2026
Subjects:	Computational Complexity Machine Learning
Online Access:	https://arxiv.org/abs/2601.01698
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911353551192064
author	Lee, Chankyu Choi, Woohyun Park, Sangwook
author_facet	Lee, Chankyu Choi, Woohyun Park, Sangwook
contents	This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_01698
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Hidden costs for inference with deep network on embedded system devices Lee, Chankyu Choi, Woohyun Park, Sangwook Computational Complexity Machine Learning This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
title	Hidden costs for inference with deep network on embedded system devices
topic	Computational Complexity Machine Learning
url	https://arxiv.org/abs/2601.01698

Similar Items