Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nottebaum, Moritz, Dunnhofer, Matteo, Micheloni, Christian
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.26425
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912987522007040
author	Nottebaum, Moritz Dunnhofer, Matteo Micheloni, Christian
author_facet	Nottebaum, Moritz Dunnhofer, Matteo Micheloni, Christian
contents	Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile phones and embedded AI accelerator modules. In contrast, CPUs do not have the possibility to parallelize operations in the same manner, wherefore models benefit from a specific design philosophy that balances amount of operations (MACs) and hardware-efficient execution by having high MACs per second (MACpS). In pursuit of this, we investigate two modifications to standard convolutions, aimed at reducing computational cost: grouping convolutions and reducing kernel sizes. While both adaptations substantially decrease the total number of MACs required for inference, sustaining low latency necessitates preserving hardware-efficiency. Our experiments across diverse CPU devices confirm that these adaptations successfully retain high hardware-efficiency on CPUs. Based on these insights, we introduce CPUBone, a new family of vision backbone models optimized for CPU-based inference. CPUBone achieves state-of-the-art Speed-Accuracy Trade-offs (SATs) across a wide range of CPU devices and effectively transfers its efficiency to downstream tasks such as object detection and semantic segmentation. Models and code are available at https://github.com/altair199797/CPUBone.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_26425
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities Nottebaum, Moritz Dunnhofer, Matteo Micheloni, Christian Computer Vision and Pattern Recognition Artificial Intelligence Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile phones and embedded AI accelerator modules. In contrast, CPUs do not have the possibility to parallelize operations in the same manner, wherefore models benefit from a specific design philosophy that balances amount of operations (MACs) and hardware-efficient execution by having high MACs per second (MACpS). In pursuit of this, we investigate two modifications to standard convolutions, aimed at reducing computational cost: grouping convolutions and reducing kernel sizes. While both adaptations substantially decrease the total number of MACs required for inference, sustaining low latency necessitates preserving hardware-efficiency. Our experiments across diverse CPU devices confirm that these adaptations successfully retain high hardware-efficiency on CPUs. Based on these insights, we introduce CPUBone, a new family of vision backbone models optimized for CPU-based inference. CPUBone achieves state-of-the-art Speed-Accuracy Trade-offs (SATs) across a wide range of CPU devices and effectively transfers its efficiency to downstream tasks such as object detection and semantic segmentation. Models and code are available at https://github.com/altair199797/CPUBone.
title	CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2603.26425

Similar Items