Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.25136 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909839416885248 |
|---|---|
| author | González-Martínez, David |
| author_facet | González-Martínez, David |
| contents | Neural network compression techniques typically require expensive fine-tuning or search procedures, rendering them impractical on commodity hardware. Inspired by recent LLM compression research, we present a general activation-aware factorization framework that can be applied to a broad range of layers. Moreover, we introduce a scalable budgeted rank allocator that allows flexible control over compression targets (e.g., retaining 50% of parameters) with no overhead. Together, these components form BALF, an efficient pipeline for compressing models without fine-tuning. We demonstrate its effectiveness across multiple scales and architectures, from ResNet-20 on CIFAR-10 to ResNeXt-101 and vision transformers on ImageNet, and show that it achieves excellent results in the fine-tuning-free regime. For instance, BALF reduces FLOPs on ResNeXt-101 by 45% with only a 1-percentage-point top-1 accuracy drop. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_25136 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression González-Martínez, David Machine Learning Neural network compression techniques typically require expensive fine-tuning or search procedures, rendering them impractical on commodity hardware. Inspired by recent LLM compression research, we present a general activation-aware factorization framework that can be applied to a broad range of layers. Moreover, we introduce a scalable budgeted rank allocator that allows flexible control over compression targets (e.g., retaining 50% of parameters) with no overhead. Together, these components form BALF, an efficient pipeline for compressing models without fine-tuning. We demonstrate its effectiveness across multiple scales and architectures, from ResNet-20 on CIFAR-10 to ResNeXt-101 and vision transformers on ImageNet, and show that it achieves excellent results in the fine-tuning-free regime. For instance, BALF reduces FLOPs on ResNeXt-101 by 45% with only a 1-percentage-point top-1 accuracy drop. |
| title | BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2509.25136 |