MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Lu, Jun, Xu, Tianyi, Ding, Bill, Li, David, Kang, Yu
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2503.17101
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913750151331840
author	Lu, Jun Xu, Tianyi Ding, Bill Li, David Kang, Yu
author_facet	Lu, Jun Xu, Tianyi Ding, Bill Li, David Kang, Yu
contents	In this paper, we tackle the critical challenge of compressing large language models (LLMs) to facilitate their practical deployment and broader adoption. We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of LLM weights. Our analysis identifies two main challenges in this task: the variability in LLM activation distributions and handling unseen activations from different datasets and models. To address these challenges, we propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions by managing activation outliers through transforming the weight matrix based on activation distribution and the original weight matrix. This method allows for the absorption of outliers into the transformed weight matrix, improving decomposition accuracy. Our comprehensive evaluation across eight datasets and six models from three distinct LLM families demonstrates the superiority of NSVD over current state-of-the-art methods, especially at medium to large compression ratios or in multilingual and multitask settings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_17101
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Large Language Model Compression via the Nested Activation-Aware Decomposition Lu, Jun Xu, Tianyi Ding, Bill Li, David Kang, Yu Machine Learning In this paper, we tackle the critical challenge of compressing large language models (LLMs) to facilitate their practical deployment and broader adoption. We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of LLM weights. Our analysis identifies two main challenges in this task: the variability in LLM activation distributions and handling unseen activations from different datasets and models. To address these challenges, we propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions by managing activation outliers through transforming the weight matrix based on activation distribution and the original weight matrix. This method allows for the absorption of outliers into the transformed weight matrix, improving decomposition accuracy. Our comprehensive evaluation across eight datasets and six models from three distinct LLM families demonstrates the superiority of NSVD over current state-of-the-art methods, especially at medium to large compression ratios or in multilingual and multitask settings.
title	Large Language Model Compression via the Nested Activation-Aware Decomposition
topic	Machine Learning
url	https://arxiv.org/abs/2503.17101

Documenti analoghi