Salvato in:
Dettagli Bibliografici
Autori principali: Song, Qingyu, Liu, Rui, Lin, Wei, Liao, Peiyu, Zhao, Wenqian, Wang, Yiwen, Hu, Shoubo, Jiang, Yining, Long, Mochun, Zhen, Hui-Ling, Jiang, Ning, Yuan, Mingxuan, Xiang, Qiao, Xu, Hong
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2505.15030
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866915863505928192
author Song, Qingyu
Liu, Rui
Lin, Wei
Liao, Peiyu
Zhao, Wenqian
Wang, Yiwen
Hu, Shoubo
Jiang, Yining
Long, Mochun
Zhen, Hui-Ling
Jiang, Ning
Yuan, Mingxuan
Xiang, Qiao
Xu, Hong
author_facet Song, Qingyu
Liu, Rui
Lin, Wei
Liao, Peiyu
Zhao, Wenqian
Wang, Yiwen
Hu, Shoubo
Jiang, Yining
Long, Mochun
Zhen, Hui-Ling
Jiang, Ning
Yuan, Mingxuan
Xiang, Qiao
Xu, Hong
contents Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/.
format Preprint
id arxiv_https___arxiv_org_abs_2505_15030
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources
Song, Qingyu
Liu, Rui
Lin, Wei
Liao, Peiyu
Zhao, Wenqian
Wang, Yiwen
Hu, Shoubo
Jiang, Yining
Long, Mochun
Zhen, Hui-Ling
Jiang, Ning
Yuan, Mingxuan
Xiang, Qiao
Xu, Hong
Machine Learning
Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/.
title A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources
topic Machine Learning
url https://arxiv.org/abs/2505.15030