Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Shu, Yao, Hu, Wenyang, Ng, See-Kiong, Low, Bryan Kian Hsiang, Yu, Fei Richard
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2409.06277
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866910993538351104
author	Shu, Yao Hu, Wenyang Ng, See-Kiong Low, Bryan Kian Hsiang Yu, Fei Richard
author_facet	Shu, Yao Hu, Wenyang Ng, See-Kiong Low, Bryan Kian Hsiang Yu, Fei Richard
contents	Large Language Models (LLMs) have become indispensable in numerous real-world applications. However, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing approaches often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To this end, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (i) it employs widely used first-order methods for efficient local updates; (ii) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (iii) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_06277
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Shu, Yao Hu, Wenyang Ng, See-Kiong Low, Bryan Kian Hsiang Yu, Fei Richard Machine Learning Artificial Intelligence Large Language Models (LLMs) have become indispensable in numerous real-world applications. However, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing approaches often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To this end, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (i) it employs widely used first-order methods for efficient local updates; (ii) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (iii) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.
title	Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2409.06277

Documents similaires