Saved in:
Bibliographic Details
Main Authors: Wu, Guangxin, Zhang, Hao, Zhibin, Zhang, Guo, Jiafeng, Cheng, Xueqi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.02674
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914234918502400
author Wu, Guangxin
Zhang, Hao
Zhibin, Zhang
Guo, Jiafeng
Cheng, Xueqi
author_facet Wu, Guangxin
Zhang, Hao
Zhibin, Zhang
Guo, Jiafeng
Cheng, Xueqi
contents Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial computational overhead, memory footprint, and inference latency. While model pruning presents a viable solution to these challenges, existing unstructured pruning techniques often yield irregular sparsity patterns that necessitate specialized hardware or software support. In this work, we explore structured pruning, which eliminates entire architectural components and maintains compatibility with standard hardware accelerators. We introduce a novel structured pruning framework that leverages a hybrid multi-domain calibration set and an iterative calibration strategy to effectively identify and remove redundant channels. Extensive experiments on various models across diverse downstream tasks show that our approach achieves significant compression with minimal performance degradation.
format Preprint
id arxiv_https___arxiv_org_abs_2601_02674
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration
Wu, Guangxin
Zhang, Hao
Zhibin, Zhang
Guo, Jiafeng
Cheng, Xueqi
Computation and Language
Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial computational overhead, memory footprint, and inference latency. While model pruning presents a viable solution to these challenges, existing unstructured pruning techniques often yield irregular sparsity patterns that necessitate specialized hardware or software support. In this work, we explore structured pruning, which eliminates entire architectural components and maintains compatibility with standard hardware accelerators. We introduce a novel structured pruning framework that leverages a hybrid multi-domain calibration set and an iterative calibration strategy to effectively identify and remove redundant channels. Extensive experiments on various models across diverse downstream tasks show that our approach achieves significant compression with minimal performance degradation.
title Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration
topic Computation and Language
url https://arxiv.org/abs/2601.02674