Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Hao, Tian, Cong, He, Zixuan, Yu, Bin, Liu, Yepang, Cao, Jialun
Format:	Preprint
Published:	2025
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2508.11269
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915447531634688
author	Chen, Hao Tian, Cong He, Zixuan Yu, Bin Liu, Yepang Cao, Jialun
author_facet	Chen, Hao Tian, Cong He, Zixuan Yu, Bin Liu, Yepang Cao, Jialun
contents	With the significant success achieved by large language models (LLMs) like LLaMA, edge computing-based LLM inference services for mobile and PC are in high demand for data privacy. However, different edge platforms have different hardware characteristics and the large demand for memory capacity and bandwidth makes it very challenging to deploy and benchmark LLMs on edge devices. In this paper, we introduce a benchmarking tool named ELIB (edge LLM inference benchmarking) to evaluate LLM inference performance of different edge platforms, and propose a novel metric named MBU to indicate the percentage of the theoretically efficient use of available memory bandwidth for a specific model running on edge hardware to optimize memory usage. We deploy ELIB on three edge platforms and benchmark using five quantized models to optimize MBU in combination with other metrics such as FLOPS, throughput, latency and accuracy. And we analyze the results to derive the key factors, constraints, unpredictability in optimizing MBU that can guide deploying LLMs on more edge platforms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_11269
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Inference performance evaluation for LLMs on edge devices with a novel benchmarking framework and metric Chen, Hao Tian, Cong He, Zixuan Yu, Bin Liu, Yepang Cao, Jialun Performance With the significant success achieved by large language models (LLMs) like LLaMA, edge computing-based LLM inference services for mobile and PC are in high demand for data privacy. However, different edge platforms have different hardware characteristics and the large demand for memory capacity and bandwidth makes it very challenging to deploy and benchmark LLMs on edge devices. In this paper, we introduce a benchmarking tool named ELIB (edge LLM inference benchmarking) to evaluate LLM inference performance of different edge platforms, and propose a novel metric named MBU to indicate the percentage of the theoretically efficient use of available memory bandwidth for a specific model running on edge hardware to optimize memory usage. We deploy ELIB on three edge platforms and benchmark using five quantized models to optimize MBU in combination with other metrics such as FLOPS, throughput, latency and accuracy. And we analyze the results to derive the key factors, constraints, unpredictability in optimizing MBU that can guide deploying LLMs on more edge platforms.
title	Inference performance evaluation for LLMs on edge devices with a novel benchmarking framework and metric
topic	Performance
url	https://arxiv.org/abs/2508.11269

Similar Items