Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gao, Chongyang, Chen, Kezhen, Rao, Jinmeng, Sun, Baochen, Liu, Ruibo, Peng, Daiyi, Zhang, Yawen, Guo, Xiaoyuan, Yang, Jie, Subrahmanian, VS
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.08562
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917589021622272
author	Gao, Chongyang Chen, Kezhen Rao, Jinmeng Sun, Baochen Liu, Ruibo Peng, Daiyi Zhang, Yawen Guo, Xiaoyuan Yang, Jie Subrahmanian, VS
author_facet	Gao, Chongyang Chen, Kezhen Rao, Jinmeng Sun, Baochen Liu, Ruibo Peng, Daiyi Zhang, Yawen Guo, Xiaoyuan Yang, Jie Subrahmanian, VS
contents	Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at https://github.com/GCYZSL/MoLA.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_08562
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Higher Layers Need More LoRA Experts Gao, Chongyang Chen, Kezhen Rao, Jinmeng Sun, Baochen Liu, Ruibo Peng, Daiyi Zhang, Yawen Guo, Xiaoyuan Yang, Jie Subrahmanian, VS Computation and Language Artificial Intelligence Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines. We find that allocating more LoRA experts to higher layers further enhances the effectiveness of models with a certain number of experts in total. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code is available at https://github.com/GCYZSL/MoLA.
title	Higher Layers Need More LoRA Experts
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2402.08562

Similar Items