Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.01717 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915605080178688 |
|---|---|
| author | Genzel, Martin Putzky, Patrick Zhao, Pengfei Schulze, Sebastian Mollenhauer, Mattes Seidel, Robert Dietzel, Stefan Wollmann, Thomas |
| author_facet | Genzel, Martin Putzky, Patrick Zhao, Pengfei Schulze, Sebastian Mollenhauer, Mattes Seidel, Robert Dietzel, Stefan Wollmann, Thomas |
| contents | The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_01717 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation Genzel, Martin Putzky, Patrick Zhao, Pengfei Schulze, Sebastian Mollenhauer, Mattes Seidel, Robert Dietzel, Stefan Wollmann, Thomas Machine Learning The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques. |
| title | Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2502.01717 |