Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Guangxin, Zhang, Hao, Zhibin, Zhang, Guo, Jiafeng, Cheng, Xueqi
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.02674
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914234918502400
author	Wu, Guangxin Zhang, Hao Zhibin, Zhang Guo, Jiafeng Cheng, Xueqi
author_facet	Wu, Guangxin Zhang, Hao Zhibin, Zhang Guo, Jiafeng Cheng, Xueqi
contents	Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial computational overhead, memory footprint, and inference latency. While model pruning presents a viable solution to these challenges, existing unstructured pruning techniques often yield irregular sparsity patterns that necessitate specialized hardware or software support. In this work, we explore structured pruning, which eliminates entire architectural components and maintains compatibility with standard hardware accelerators. We introduce a novel structured pruning framework that leverages a hybrid multi-domain calibration set and an iterative calibration strategy to effectively identify and remove redundant channels. Extensive experiments on various models across diverse downstream tasks show that our approach achieves significant compression with minimal performance degradation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_02674
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration Wu, Guangxin Zhang, Hao Zhibin, Zhang Guo, Jiafeng Cheng, Xueqi Computation and Language Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial computational overhead, memory footprint, and inference latency. While model pruning presents a viable solution to these challenges, existing unstructured pruning techniques often yield irregular sparsity patterns that necessitate specialized hardware or software support. In this work, we explore structured pruning, which eliminates entire architectural components and maintains compatibility with standard hardware accelerators. We introduce a novel structured pruning framework that leverages a hybrid multi-domain calibration set and an iterative calibration strategy to effectively identify and remove redundant channels. Extensive experiments on various models across diverse downstream tasks show that our approach achieves significant compression with minimal performance degradation.
title	Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration
topic	Computation and Language
url	https://arxiv.org/abs/2601.02674

Similar Items