Saved in:
Bibliographic Details
Main Authors: Lopes, Alexandre, Santos, Fernando Pereira dos, de Oliveira, Diulhio, Schiezaro, Mauricio, Pedrini, Helio
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.08250
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913468755476480
author Lopes, Alexandre
Santos, Fernando Pereira dos
de Oliveira, Diulhio
Schiezaro, Mauricio
Pedrini, Helio
author_facet Lopes, Alexandre
Santos, Fernando Pereira dos
de Oliveira, Diulhio
Schiezaro, Mauricio
Pedrini, Helio
contents Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression. Case studies for compression models are available at \href{https://github.com/venturusbr/cv-model-compression}{https://github.com/venturusbr/cv-model-compression}.
format Preprint
id arxiv_https___arxiv_org_abs_2408_08250
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Lopes, Alexandre
Santos, Fernando Pereira dos
de Oliveira, Diulhio
Schiezaro, Mauricio
Pedrini, Helio
Computer Vision and Pattern Recognition
Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression. Case studies for compression models are available at \href{https://github.com/venturusbr/cv-model-compression}{https://github.com/venturusbr/cv-model-compression}.
title Computer Vision Model Compression Techniques for Embedded Systems: A Survey
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2408.08250