MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Song, Qingyu, Liu, Rui, Lin, Wei, Liao, Peiyu, Zhao, Wenqian, Wang, Yiwen, Hu, Shoubo, Jiang, Yining, Long, Mochun, Zhen, Hui-Ling, Jiang, Ning, Yuan, Mingxuan, Xiang, Qiao, Xu, Hong
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2505.15030
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866915863505928192
author	Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong
author_facet	Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong
contents	Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_15030
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong Machine Learning Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/.
title	A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources
topic	Machine Learning
url	https://arxiv.org/abs/2505.15030

Documenti analoghi