Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Shan, Weiqiao, Meng, Long, Zheng, Tong, Luo, Yingfeng, Li, Bei, Wang, junxin, Xiao, Tong, Zhu, Jingbo
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.01455
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912140644843520
author	Shan, Weiqiao Meng, Long Zheng, Tong Luo, Yingfeng Li, Bei Wang, junxin Xiao, Tong Zhu, Jingbo
author_facet	Shan, Weiqiao Meng, Long Zheng, Tong Luo, Yingfeng Li, Bei Wang, junxin Xiao, Tong Zhu, Jingbo
contents	Large language models (LLMs) exhibit exceptional performance across various downstream tasks. However, they encounter limitations due to slow inference speeds stemming from their extensive parameters. The early exit (EE) is an approach that aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model, which offers a promising solution to this challenge. However, additional output layers and joint optimization used in conventional EE hinder the application of EE in LLMs. In this paper, we explore the possibility of LLMs EE without additional output layers and joint optimization. Our findings indicate that EE is a natural capability within transformer-based models. While joint optimization does not give model EE capability, it must be employed to address challenges by improving the accuracy of locating the optimal EE layer through gating functions. Additionally, our study reveals patterns in EE behavior from a sub-word perspective based on the LLaMA model and the potential possibility for EE based on sub-layers.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_01455
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization Shan, Weiqiao Meng, Long Zheng, Tong Luo, Yingfeng Li, Bei Wang, junxin Xiao, Tong Zhu, Jingbo Computation and Language Large language models (LLMs) exhibit exceptional performance across various downstream tasks. However, they encounter limitations due to slow inference speeds stemming from their extensive parameters. The early exit (EE) is an approach that aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model, which offers a promising solution to this challenge. However, additional output layers and joint optimization used in conventional EE hinder the application of EE in LLMs. In this paper, we explore the possibility of LLMs EE without additional output layers and joint optimization. Our findings indicate that EE is a natural capability within transformer-based models. While joint optimization does not give model EE capability, it must be employed to address challenges by improving the accuracy of locating the optimal EE layer through gating functions. Additionally, our study reveals patterns in EE behavior from a sub-word perspective based on the LLaMA model and the potential possibility for EE based on sub-layers.
title	Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
topic	Computation and Language
url	https://arxiv.org/abs/2412.01455

Similar Items