Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Wenqiang, Yu, Zhendong, Liu, Xinyu, Yang, Jiaming, Xiao, Rong, Wang, Tao, Tang, Chenwei, Lv, Jiancheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computational Complexity
Online Access:	https://arxiv.org/abs/2504.17263
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913806640218112
author	Zhou, Wenqiang Yu, Zhendong Liu, Xinyu Yang, Jiaming Xiao, Rong Wang, Tao Tang, Chenwei Lv, Jiancheng
author_facet	Zhou, Wenqiang Yu, Zhendong Liu, Xinyu Yang, Jiaming Xiao, Rong Wang, Tao Tang, Chenwei Lv, Jiancheng
contents	Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization parameters trainable can significantly improve the performance of QAT, but at the cost of compromising the flexibility during inference, especially when dealing with activation values with substantially different distributions. In this paper, we propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ), to resolve this conflict. Specifically, the proposed ASQ method first dynamically adjusts quantization scaling factors through a trained module capable of accommodating different activations. Then, to address the rigid resolution issue inherent in Power of Two (POT) quantization, we propose an efficient non-uniform quantization scheme. We utilize the Power Of Square root of Two (POST) as the basis for exponential quantization, effectively handling the bell-shaped distribution of neural network weights across various bit-widths while maintaining computational efficiency through a Look-Up Table method (LUT). Extensive experimental results demonstrate that the proposed ASQ method is superior to the state-of-the-art QAT approaches. Notably that the ASQ is even competitive compared to full precision baselines, with its 4-bit quantized ResNet34 model improving accuracy by 1.2\% on ImageNet.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_17263
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Precision Neural Network Quantization via Learnable Adaptive Modules Zhou, Wenqiang Yu, Zhendong Liu, Xinyu Yang, Jiaming Xiao, Rong Wang, Tao Tang, Chenwei Lv, Jiancheng Computer Vision and Pattern Recognition Computational Complexity Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization parameters trainable can significantly improve the performance of QAT, but at the cost of compromising the flexibility during inference, especially when dealing with activation values with substantially different distributions. In this paper, we propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ), to resolve this conflict. Specifically, the proposed ASQ method first dynamically adjusts quantization scaling factors through a trained module capable of accommodating different activations. Then, to address the rigid resolution issue inherent in Power of Two (POT) quantization, we propose an efficient non-uniform quantization scheme. We utilize the Power Of Square root of Two (POST) as the basis for exponential quantization, effectively handling the bell-shaped distribution of neural network weights across various bit-widths while maintaining computational efficiency through a Look-Up Table method (LUT). Extensive experimental results demonstrate that the proposed ASQ method is superior to the state-of-the-art QAT approaches. Notably that the ASQ is even competitive compared to full precision baselines, with its 4-bit quantized ResNet34 model improving accuracy by 1.2\% on ImageNet.
title	Precision Neural Network Quantization via Learnable Adaptive Modules
topic	Computer Vision and Pattern Recognition Computational Complexity
url	https://arxiv.org/abs/2504.17263

Similar Items