Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Dutta, Abhinav, Krishnan, Sanjeev, Kwatra, Nipun, Ramjee, Ramachandran
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2407.09141
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866917946035535872
author	Dutta, Abhinav Krishnan, Sanjeev Kwatra, Nipun Ramjee, Ramachandran
author_facet	Dutta, Abhinav Krishnan, Sanjeev Kwatra, Nipun Ramjee, Ramachandran
contents	When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_09141
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Accuracy is Not All You Need Dutta, Abhinav Krishnan, Sanjeev Kwatra, Nipun Ramjee, Ramachandran Machine Learning When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
title	Accuracy is Not All You Need
topic	Machine Learning
url	https://arxiv.org/abs/2407.09141

Documents similaires