Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Yuli, Chen, Qingxuan, Benini, Luca, Sun, Guolei, Li, Yawei
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2602.02151
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918319408283648
author	Zhou, Yuli Chen, Qingxuan Benini, Luca Sun, Guolei Li, Yawei
author_facet	Zhou, Yuli Chen, Qingxuan Benini, Luca Sun, Guolei Li, Yawei
contents	Adaptive Rounding has emerged as an alternative to round-to-nearest (RTN) for post-training quantization by enabling cross-element error cancellation. Yet, dense and element-wise rounding matrices are prohibitively expensive for billion-parameter large language models (LLMs). We revisit adaptive rounding from an efficiency perspective and propose VQRound, a parameter-efficient optimization framework that reparameterizes the rounding matrix into a compact codebook. Unlike low-rank alternatives, VQRound minimizes the element-wise worst-case error under $L_\infty$ norm, which is critical for handling heavy-tailed weight distributions in LLMs. Beyond reparameterization, we identify rounding initialization as a decisive factor and develop a lightweight end-to-end finetuning pipeline that optimizes codebooks across all layers using only 128 samples. Extensive experiments on OPT, LLaMA, LLaMA2, and Qwen3 models demonstrate that VQRound achieves better convergence than traditional adaptive rounding at the same number of steps while using as little as 0.2% of the trainable parameters. Our results show that adaptive rounding can be made both scalable and fast-fitting. The code is available at https://github.com/zhoustan/VQRound.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_02151
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization Zhou, Yuli Chen, Qingxuan Benini, Luca Sun, Guolei Li, Yawei Machine Learning Computation and Language Adaptive Rounding has emerged as an alternative to round-to-nearest (RTN) for post-training quantization by enabling cross-element error cancellation. Yet, dense and element-wise rounding matrices are prohibitively expensive for billion-parameter large language models (LLMs). We revisit adaptive rounding from an efficiency perspective and propose VQRound, a parameter-efficient optimization framework that reparameterizes the rounding matrix into a compact codebook. Unlike low-rank alternatives, VQRound minimizes the element-wise worst-case error under $L_\infty$ norm, which is critical for handling heavy-tailed weight distributions in LLMs. Beyond reparameterization, we identify rounding initialization as a decisive factor and develop a lightweight end-to-end finetuning pipeline that optimizes codebooks across all layers using only 128 samples. Extensive experiments on OPT, LLaMA, LLaMA2, and Qwen3 models demonstrate that VQRound achieves better convergence than traditional adaptive rounding at the same number of steps while using as little as 0.2% of the trainable parameters. Our results show that adaptive rounding can be made both scalable and fast-fitting. The code is available at https://github.com/zhoustan/VQRound.
title	Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2602.02151

Similar Items