Saved in:
Bibliographic Details
Main Authors: Sah, Sudhakar, Chabbra, Nikhil, Durnerin, Matthieu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.11716
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918202386153472
author Sah, Sudhakar
Chabbra, Nikhil
Durnerin, Matthieu
author_facet Sah, Sudhakar
Chabbra, Nikhil
Durnerin, Matthieu
contents Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8x with less than 4% accuracy drop; on COCO, we achieve 2x compression of YOLOv5s without any accuracy drop and 2x compression of YOLOv5n with a 2.5% drop. Finally, we present a new family of compressed models, STResNet, with competitive performance compared to other efficient models.
format Preprint
id arxiv_https___arxiv_org_abs_2511_11716
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
Sah, Sudhakar
Chabbra, Nikhil
Durnerin, Matthieu
Computer Vision and Pattern Recognition
Deep Convolutional Neural Networks (CNNs) are increasingly difficult to deploy on microcontrollers (MCUs) and lightweight NPUs (Neural Processing Units) due to their growing size and compute demands. Low-rank tensor decomposition, such as Tucker factorization, is a promising way to reduce parameters and operations with reasonable accuracy loss. However, existing approaches select ranks locally and often ignore global trade-offs between compression and accuracy. We introduce CompressNAS, a MicroNAS-inspired framework that treats rank selection as a global search problem. CompressNAS employs a fast accuracy estimator to evaluate candidate decompositions, enabling efficient yet exhaustive rank exploration under memory and accuracy constraints. In ImageNet, CompressNAS compresses ResNet-18 by 8x with less than 4% accuracy drop; on COCO, we achieve 2x compression of YOLOv5s without any accuracy drop and 2x compression of YOLOv5n with a 2.5% drop. Finally, we present a new family of compressed models, STResNet, with competitive performance compared to other efficient models.
title CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2511.11716