Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fan, Chenghao, Lu, Zhenyi, Wei, Wei, Tian, Jie, Qu, Xiaoye, Chen, Dangyang, Cheng, Yu
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2406.15480
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910647802920960
author	Fan, Chenghao Lu, Zhenyi Wei, Wei Tian, Jie Qu, Xiaoye Chen, Dangyang Cheng, Yu
author_facet	Fan, Chenghao Lu, Zhenyi Wei, Wei Tian, Jie Qu, Xiaoye Chen, Dangyang Cheng, Yu
contents	Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?} In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance. % To address this, To surmount these limitations, we propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. This method adaptively allocates weights among these models at each decoding step, learning the weights through Kullback-Leibler divergence constrained optimization problems. We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results. By transferring expertise from the 7B model to the 13B model, our method closes the performance gap by 96.4\% in single-task scenarios and by 86.3\% in multi-task scenarios compared to full fine-tuning of the 13B model. Notably, we achieve surpassing performance on unseen tasks. Moreover, we further demonstrate that our method can effortlessly integrate in-context learning for single tasks and task arithmetic for multi-task scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_15480
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion Fan, Chenghao Lu, Zhenyi Wei, Wei Tian, Jie Qu, Xiaoye Chen, Dangyang Cheng, Yu Computation and Language Artificial Intelligence Machine Learning Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. \thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?} In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance. % To address this, To surmount these limitations, we propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. This method adaptively allocates weights among these models at each decoding step, learning the weights through Kullback-Leibler divergence constrained optimization problems. We conduct extensive experiments across various benchmarks in both single-task and multi-task settings, achieving leading results. By transferring expertise from the 7B model to the 13B model, our method closes the performance gap by 96.4\% in single-task scenarios and by 86.3\% in multi-task scenarios compared to full fine-tuning of the 13B model. Notably, we achieve surpassing performance on unseen tasks. Moreover, we further demonstrate that our method can effortlessly integrate in-context learning for single tasks and task arithmetic for multi-task scenarios.
title	On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2406.15480

Similar Items