Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.01698 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911353551192064 |
|---|---|
| author | Lee, Chankyu Choi, Woohyun Park, Sangwook |
| author_facet | Lee, Chankyu Choi, Woohyun Park, Sangwook |
| contents | This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_01698 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Hidden costs for inference with deep network on embedded system devices Lee, Chankyu Choi, Woohyun Park, Sangwook Computational Complexity Machine Learning This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems. |
| title | Hidden costs for inference with deep network on embedded system devices |
| topic | Computational Complexity Machine Learning |
| url | https://arxiv.org/abs/2601.01698 |