Salvato in:
| Autori principali: | , , , , , , , , , , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2025
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2505.15030 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866915863505928192 |
|---|---|
| author | Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong |
| author_facet | Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong |
| contents | Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_15030 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources Song, Qingyu Liu, Rui Lin, Wei Liao, Peiyu Zhao, Wenqian Wang, Yiwen Hu, Shoubo Jiang, Yining Long, Mochun Zhen, Hui-Ling Jiang, Ning Yuan, Mingxuan Xiang, Qiao Xu, Hong Machine Learning Deploying Large Language Models (LLMs) on edge devices enhances privacy but faces performance hurdles due to limited resources. We introduce a systematic methodology to evaluate on-device LLMs, balancing capability, efficiency, and resource constraints. Through an extensive analysis of models (0.5B-14B) and seven post-training quantization (PTQ) methods on commodity hardware, we demonstrate that: 1) Heavily quantized large models consistently outperform smaller, high-precision models, with a performance threshold at ~3.5 effective bits-per-weight (BPW); 2) Resource utilization scales linearly with BPW, though power and memory footprints vary by quantization algorithm; and 3) With a reduction in model size, the primary constraint on throughput transitions from communication overhead to computational latency. We conclude by offering guidelines for optimizing LLMs in resource-constrained edge environments. Our codebase is available at https://anonymous.4open.science/r/LLMOnDevice/. |
| title | A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2505.15030 |