Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Deng, Zihao, Sharify, Sayeh, Wang, Xin, Orshansky, Michael
Format:	Preprint
Veröffentlicht:	2023
Schlagworte:	Neural and Evolutionary Computing
Online-Zugang:	https://arxiv.org/abs/2307.05657
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910905835454464
author	Deng, Zihao Sharify, Sayeh Wang, Xin Orshansky, Michael
author_facet	Deng, Zihao Sharify, Sayeh Wang, Xin Orshansky, Michael
contents	Quantization is a widely used technique to compress neural networks. Assigning uniform bit-widths across all layers can result in significant accuracy degradation at low precision and inefficiency at high precision. Mixed-precision quantization (MPQ) addresses this by assigning varied bit-widths to layers, optimizing the accuracy-efficiency trade-off. Existing sensitivity-based methods for MPQ assume that quantization errors across layers are independent, which leads to suboptimal choices. We introduce CLADO, a practical sensitivity-based MPQ algorithm that captures cross-layer dependency of quantization error. CLADO approximates pairwise cross-layer errors using linear equations on a small data subset. Layerwise bit-widths are assigned by optimizing a new MPQ formulation based on cross-layer quantization errors using an Integer Quadratic Program. Experiments with CNN and vision transformer models on ImageNet demonstrate that CLADO achieves state-of-the-art mixed-precision quantization performance. Code repository available here: https://github.com/JamesTuna/CLADO_MPQ
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_05657
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic Programming Deng, Zihao Sharify, Sayeh Wang, Xin Orshansky, Michael Neural and Evolutionary Computing Quantization is a widely used technique to compress neural networks. Assigning uniform bit-widths across all layers can result in significant accuracy degradation at low precision and inefficiency at high precision. Mixed-precision quantization (MPQ) addresses this by assigning varied bit-widths to layers, optimizing the accuracy-efficiency trade-off. Existing sensitivity-based methods for MPQ assume that quantization errors across layers are independent, which leads to suboptimal choices. We introduce CLADO, a practical sensitivity-based MPQ algorithm that captures cross-layer dependency of quantization error. CLADO approximates pairwise cross-layer errors using linear equations on a small data subset. Layerwise bit-widths are assigned by optimizing a new MPQ formulation based on cross-layer quantization errors using an Integer Quadratic Program. Experiments with CNN and vision transformer models on ImageNet demonstrate that CLADO achieves state-of-the-art mixed-precision quantization performance. Code repository available here: https://github.com/JamesTuna/CLADO_MPQ
title	Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic Programming
topic	Neural and Evolutionary Computing
url	https://arxiv.org/abs/2307.05657

Ähnliche Einträge