Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Xiao, He, Yang, Qingyao, Xie, Dirui, Xu, Wendong, Su, Zunhai, yang, Runming, Zhou, Wenyong, Liu, Haobo, Liu, Zhengwu, Wong, Ngai
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2508.03332
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866914221563838464
author	Xiao, He Yang, Qingyao Xie, Dirui Xu, Wendong Su, Zunhai yang, Runming Zhou, Wenyong Liu, Haobo Liu, Zhengwu Wong, Ngai
author_facet	Xiao, He Yang, Qingyao Xie, Dirui Xu, Wendong Su, Zunhai yang, Runming Zhou, Wenyong Liu, Haobo Liu, Zhengwu Wong, Ngai
contents	Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ Layer-wise information effectiveness Quantization, a hardware-native, metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-8B models, model parameters less than 8B, under extreme low-bit compression. LieQ keeps uniform bit-width within each layer while mixing precision across layers, preserving standard multiplication kernels and avoiding irregular memory access, codebooks, or irregular formats at inference time. Our method uncovers a strong correlation between layer-wise functional saliency and representational compactness, revealing that layers with higher training-induced energy concentration are functionally irreplaceable. Leveraging this insight, we propose a purely geometry-driven sensitivity proxy that enables automatic bit-width allocation under a target average-bit budget without expensive gradient updates or inference-based perplexity probing. At sub 2-bit, LieQ consistently reduces the large accuracy gap typically observed for naive 2-bit baselines on Qwen3 and LLaMA3.x families, while retaining standard-kernel efficiency. These properties make LieQ a practical path toward deploying small language models on resource-constrained edge devices. Code will available here: https://github.com/HeXiao-55/LieQ-official.git.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_03332
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models Xiao, He Yang, Qingyao Xie, Dirui Xu, Wendong Su, Zunhai yang, Runming Zhou, Wenyong Liu, Haobo Liu, Zhengwu Wong, Ngai Machine Learning Artificial Intelligence Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ Layer-wise information effectiveness Quantization, a hardware-native, metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-8B models, model parameters less than 8B, under extreme low-bit compression. LieQ keeps uniform bit-width within each layer while mixing precision across layers, preserving standard multiplication kernels and avoiding irregular memory access, codebooks, or irregular formats at inference time. Our method uncovers a strong correlation between layer-wise functional saliency and representational compactness, revealing that layers with higher training-induced energy concentration are functionally irreplaceable. Leveraging this insight, we propose a purely geometry-driven sensitivity proxy that enables automatic bit-width allocation under a target average-bit budget without expensive gradient updates or inference-based perplexity probing. At sub 2-bit, LieQ consistently reduces the large accuracy gap typically observed for naive 2-bit baselines on Qwen3 and LLaMA3.x families, while retaining standard-kernel efficiency. These properties make LieQ a practical path toward deploying small language models on resource-constrained edge devices. Code will available here: https://github.com/HeXiao-55/LieQ-official.git.
title	Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2508.03332

Ejemplares similares