Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Xijie, Liu, Zechun, Liu, Shih-Yang, Cheng, Kwang-Ting
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2407.08044
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913520635871232
author	Huang, Xijie Liu, Zechun Liu, Shih-Yang Cheng, Kwang-Ting
author_facet	Huang, Xijie Liu, Zechun Liu, Shih-Yang Cheng, Kwang-Ting
contents	Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_08044
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization Huang, Xijie Liu, Zechun Liu, Shih-Yang Cheng, Kwang-Ting Computation and Language Artificial Intelligence Machine Learning Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme for effective weight-activation quantization. RoLoRA utilizes rotation for outlier elimination and proposes rotation-aware fine-tuning to preserve the outlier-free characteristics in rotated LLMs. Experimental results show RoLoRA consistently improves low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. We evaluate RoLoRA across LLaMA2-7B/13B, LLaMA3-8B models, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2- 13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LLaVA-1.5-7B). Codes are available at https://github.com/HuangOwen/RoLoRA
title	RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2407.08044

Similar Items