Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yang, Chen, Yi, Zhao, Yongwei, Hao, Yifan, Zheng, Zifu, Kong, Weihao, Li, Zhangmai, Jiang, Dongchen, Xia, Ruiyang, Ma, Zhihong, Liu, Zisheng, Wan, Zhaoyong, Lu, Yunqi, Liu, Ximing, Guo, Hongrui, Yang, Zhihao, Wang, Zhe, Ma, Tianrui, Zou, Mo, Zhang, Rui, Li, Ling, Hu, Xing, Du, Zidong, Xu, Zhiwei, Guo, Qi, Chen, Tianshi, Chen, Yunji
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture Computation and Language
Online Access:	https://arxiv.org/abs/2508.16151
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915725026787328
author	Liu, Yang Chen, Yi Zhao, Yongwei Hao, Yifan Zheng, Zifu Kong, Weihao Li, Zhangmai Jiang, Dongchen Xia, Ruiyang Ma, Zhihong Liu, Zisheng Wan, Zhaoyong Lu, Yunqi Liu, Ximing Guo, Hongrui Yang, Zhihao Wang, Zhe Ma, Tianrui Zou, Mo Zhang, Rui Li, Ling Hu, Xing Du, Zidong Xu, Zhiwei Guo, Qi Chen, Tianshi Chen, Yunji
author_facet	Liu, Yang Chen, Yi Zhao, Yongwei Hao, Yifan Zheng, Zifu Kong, Weihao Li, Zhangmai Jiang, Dongchen Xia, Ruiyang Ma, Zhihong Liu, Zisheng Wan, Zhaoyong Lu, Yunqi Liu, Ximing Guo, Hongrui Yang, Zhihao Wang, Zhe Ma, Tianrui Zou, Mo Zhang, Rui Li, Ling Hu, Xing Du, Zidong Xu, Zhiwei Guo, Qi Chen, Tianshi Chen, Yunji
contents	The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_16151
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates Liu, Yang Chen, Yi Zhao, Yongwei Hao, Yifan Zheng, Zifu Kong, Weihao Li, Zhangmai Jiang, Dongchen Xia, Ruiyang Ma, Zhihong Liu, Zisheng Wan, Zhaoyong Lu, Yunqi Liu, Ximing Guo, Hongrui Yang, Zhihao Wang, Zhe Ma, Tianrui Zou, Mo Zhang, Rui Li, Ling Hu, Xing Du, Zidong Xu, Zhiwei Guo, Qi Chen, Tianshi Chen, Yunji Hardware Architecture Computation and Language The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.
title	Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
topic	Hardware Architecture Computation and Language
url	https://arxiv.org/abs/2508.16151

Similar Items