Saved in:
Bibliographic Details
Main Authors: Genzel, Martin, Putzky, Patrick, Zhao, Pengfei, Schulze, Sebastian, Mollenhauer, Mattes, Seidel, Robert, Dietzel, Stefan, Wollmann, Thomas
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.01717
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915605080178688
author Genzel, Martin
Putzky, Patrick
Zhao, Pengfei
Schulze, Sebastian
Mollenhauer, Mattes
Seidel, Robert
Dietzel, Stefan
Wollmann, Thomas
author_facet Genzel, Martin
Putzky, Patrick
Zhao, Pengfei
Schulze, Sebastian
Mollenhauer, Mattes
Seidel, Robert
Dietzel, Stefan
Wollmann, Thomas
contents The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.
format Preprint
id arxiv_https___arxiv_org_abs_2502_01717
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Genzel, Martin
Putzky, Patrick
Zhao, Pengfei
Schulze, Sebastian
Mollenhauer, Mattes
Seidel, Robert
Dietzel, Stefan
Wollmann, Thomas
Machine Learning
The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.
title Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
topic Machine Learning
url https://arxiv.org/abs/2502.01717