Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Yao, Li, Yuqi, Xie, Wenbin, Yu, Shanqing, Xuan, Qi, Zhu, Zhaowei, Wen, Shiping
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.23652
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914116550000640
author	Lu, Yao Li, Yuqi Xie, Wenbin Yu, Shanqing Xuan, Qi Zhu, Zhaowei Wen, Shiping
author_facet	Lu, Yao Li, Yuqi Xie, Wenbin Yu, Shanqing Xuan, Qi Zhu, Zhaowei Wen, Shiping
contents	Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To this end, layer pruning has been proposed to reduce the computational overhead by directly removing redundant layers. However, existing layer pruning methods typically rely on hand-crafted metrics to evaluate and remove individual layers, while ignoring the dependencies between layers. This can disrupt the model's information flow and severely degrade performance. To address these issues, we propose CLP, a novel continuous layer pruning framework that introduces two key innovations: a differentiable concave gate algorithm that automatically identifies the best continuous layer segments for pruning via gradient-based optimization; and a cutoff endpoint tuning strategy that effectively restores model performance by fine-tuning only the layers adjacent to the pruned segments. Extensive experiments across multiple model architectures (including LLaMA2, LLaMA3 and Qwen) and sizes (from $7$B to $70$B parameters) show that CLP significantly outperforms existing state-of-the-art baselines. For example, at a pruning rate of $20\%$, CLP achieves an average performance retention of $95.34\%$ on LLaMA3-70B, outperforming baselines by $4.29\%$-$30.52\%$. Furthermore, CLP can be seamlessly combined with quantization to further compress the model with only a slight performance loss.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_23652
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models Lu, Yao Li, Yuqi Xie, Wenbin Yu, Shanqing Xuan, Qi Zhu, Zhaowei Wen, Shiping Machine Learning Artificial Intelligence Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical deployment on resource-constrained edge devices. To this end, layer pruning has been proposed to reduce the computational overhead by directly removing redundant layers. However, existing layer pruning methods typically rely on hand-crafted metrics to evaluate and remove individual layers, while ignoring the dependencies between layers. This can disrupt the model's information flow and severely degrade performance. To address these issues, we propose CLP, a novel continuous layer pruning framework that introduces two key innovations: a differentiable concave gate algorithm that automatically identifies the best continuous layer segments for pruning via gradient-based optimization; and a cutoff endpoint tuning strategy that effectively restores model performance by fine-tuning only the layers adjacent to the pruned segments. Extensive experiments across multiple model architectures (including LLaMA2, LLaMA3 and Qwen) and sizes (from $7$B to $70$B parameters) show that CLP significantly outperforms existing state-of-the-art baselines. For example, at a pruning rate of $20\%$, CLP achieves an average performance retention of $95.34\%$ on LLaMA3-70B, outperforming baselines by $4.29\%$-$30.52\%$. Furthermore, CLP can be seamlessly combined with quantization to further compress the model with only a slight performance loss.
title	The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2510.23652

Similar Items