Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Ning, Guo, Song, Zhang, Tuo, Li, Muqing, Hong, Zicong, Zhou, Qihua, Yuan, Xin, Zhang, Haijun
Format:	Preprint
Published:	2025
Subjects:	Networking and Internet Architecture
Online Access:	https://arxiv.org/abs/2502.08381
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916610353135616
author	Li, Ning Guo, Song Zhang, Tuo Li, Muqing Hong, Zicong Zhou, Qihua Yuan, Xin Zhang, Haijun
author_facet	Li, Ning Guo, Song Zhang, Tuo Li, Muqing Hong, Zicong Zhou, Qihua Yuan, Xin Zhang, Haijun
contents	The powerfulness of LLMs indicates that deploying various LLMs with different scales and architectures on end, edge, and cloud to satisfy different requirements and adaptive heterogeneous hardware is the critical way to achieve ubiquitous intelligence for 6G. However, the massive parameter scale of LLMs poses significant challenges in deploying them on edge devices due to high computational and storage demands. Considering that the sparse activation in Mixture of Experts (MoE) is effective on scalable and dynamic allocation of computational and communications resources at the edge, this paper proposes a novel MoE-empowered collaborative deployment framework for edge LLMs, denoted as CoEL. This framework fully leverages the properties of MoE architecture and encompasses four key aspects: Perception, Deployment, Compression, and Updating. Edge servers broadcast their resource status and the specific resource requirements of LLMs to their neighbors. Then, utilizing this data, two sophisticated deployment strategies are proposed for satisfying varying model scales, ensuring that each model is deployed effectively. One for deploying LLMs on a single edge device through intra-device resource collaboration, and another for a distributed deployment across multiple edge devices via inter-device resource collaboration. Furthermore, both the models and the intermediate data are compressed for reducing memory footprint by quantization and reducing the volume of intermediate data by token fusion and pruning. Finally, given the dynamic of network topology, resource status, and user requirements, the deployment strategies are regularly updated to maintain its relevance and effectiveness. This paper also delineates the challenges and potential research directions for the deployment of edge LLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08381
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities Li, Ning Guo, Song Zhang, Tuo Li, Muqing Hong, Zicong Zhou, Qihua Yuan, Xin Zhang, Haijun Networking and Internet Architecture The powerfulness of LLMs indicates that deploying various LLMs with different scales and architectures on end, edge, and cloud to satisfy different requirements and adaptive heterogeneous hardware is the critical way to achieve ubiquitous intelligence for 6G. However, the massive parameter scale of LLMs poses significant challenges in deploying them on edge devices due to high computational and storage demands. Considering that the sparse activation in Mixture of Experts (MoE) is effective on scalable and dynamic allocation of computational and communications resources at the edge, this paper proposes a novel MoE-empowered collaborative deployment framework for edge LLMs, denoted as CoEL. This framework fully leverages the properties of MoE architecture and encompasses four key aspects: Perception, Deployment, Compression, and Updating. Edge servers broadcast their resource status and the specific resource requirements of LLMs to their neighbors. Then, utilizing this data, two sophisticated deployment strategies are proposed for satisfying varying model scales, ensuring that each model is deployed effectively. One for deploying LLMs on a single edge device through intra-device resource collaboration, and another for a distributed deployment across multiple edge devices via inter-device resource collaboration. Furthermore, both the models and the intermediate data are compressed for reducing memory footprint by quantization and reducing the volume of intermediate data by token fusion and pruning. Finally, given the dynamic of network topology, resource status, and user requirements, the deployment strategies are regularly updated to maintain its relevance and effectiveness. This paper also delineates the challenges and potential research directions for the deployment of edge LLMs.
title	The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities
topic	Networking and Internet Architecture
url	https://arxiv.org/abs/2502.08381

Similar Items