Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Hui, Zheng, Qihang, Huo, Chenghai, Guo, Dongliang, Yang, Haoqi, Zhang, Yang
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Machine Learning
Online Access:	https://arxiv.org/abs/2512.21571
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908731847999488
author	Guo, Hui Zheng, Qihang Huo, Chenghai Guo, Dongliang Yang, Haoqi Zhang, Yang
author_facet	Guo, Hui Zheng, Qihang Huo, Chenghai Guo, Dongliang Yang, Haoqi Zhang, Yang
contents	The efficient deployment of large language models (LLMs) is hindered by memory architecture heterogeneity, where traditional compilers suffer from fragmented workflows and high adaptation costs. We present nncase, an open-source, end-to-end compilation framework designed to unify optimization across diverse targets. Central to nncase is an e-graph-based term rewriting engine that mitigates the phase ordering problem, enabling global exploration of computation and data movement strategies. The framework integrates three key modules: Auto Vectorize for adapting to heterogeneous computing units, Auto Distribution for searching parallel strategies with cost-aware communication optimization, and Auto Schedule for maximizing on-chip cache locality. Furthermore, a buffer-aware Codegen phase ensures efficient kernel instantiation. Evaluations show that nncase outperforms mainstream frameworks like MLC LLM and Intel IPEX on Qwen3 series models and achieves performance comparable to the hand-optimized llama.cpp on CPUs, demonstrating the viability of automated compilation for high-performance LLM deployment. The source code is available at https://github.com/kendryte/nncase.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_21571
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures Guo, Hui Zheng, Qihang Huo, Chenghai Guo, Dongliang Yang, Haoqi Zhang, Yang Distributed, Parallel, and Cluster Computing Machine Learning The efficient deployment of large language models (LLMs) is hindered by memory architecture heterogeneity, where traditional compilers suffer from fragmented workflows and high adaptation costs. We present nncase, an open-source, end-to-end compilation framework designed to unify optimization across diverse targets. Central to nncase is an e-graph-based term rewriting engine that mitigates the phase ordering problem, enabling global exploration of computation and data movement strategies. The framework integrates three key modules: Auto Vectorize for adapting to heterogeneous computing units, Auto Distribution for searching parallel strategies with cost-aware communication optimization, and Auto Schedule for maximizing on-chip cache locality. Furthermore, a buffer-aware Codegen phase ensures efficient kernel instantiation. Evaluations show that nncase outperforms mainstream frameworks like MLC LLM and Intel IPEX on Qwen3 series models and achieves performance comparable to the hand-optimized llama.cpp on CPUs, demonstrating the viability of automated compilation for high-performance LLM deployment. The source code is available at https://github.com/kendryte/nncase.
title	nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures
topic	Distributed, Parallel, and Cluster Computing Machine Learning
url	https://arxiv.org/abs/2512.21571

Similar Items