Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tan, Zhen, Dong, Daize, Zhao, Xinyu, Peng, Jie, Cheng, Yu, Chen, Tianlong
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2407.11030
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914871932616704
author	Tan, Zhen Dong, Daize Zhao, Xinyu Peng, Jie Cheng, Yu Chen, Tianlong
author_facet	Tan, Zhen Dong, Daize Zhao, Xinyu Peng, Jie Cheng, Yu Chen, Tianlong
contents	In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_11030
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs Tan, Zhen Dong, Daize Zhao, Xinyu Peng, Jie Cheng, Yu Chen, Tianlong Machine Learning Artificial Intelligence Computation and Language In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building efficient yet powerful LLMs. We will release our implementation and model weights upon acceptance.
title	DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2407.11030

Similar Items