Saved in:
Bibliographic Details
Main Authors: Liu, Yang, Chen, Yi, Zhao, Yongwei, Hao, Yifan, Zheng, Zifu, Kong, Weihao, Li, Zhangmai, Jiang, Dongchen, Xia, Ruiyang, Ma, Zhihong, Liu, Zisheng, Wan, Zhaoyong, Lu, Yunqi, Liu, Ximing, Guo, Hongrui, Yang, Zhihao, Wang, Zhe, Ma, Tianrui, Zou, Mo, Zhang, Rui, Li, Ling, Hu, Xing, Du, Zidong, Xu, Zhiwei, Guo, Qi, Chen, Tianshi, Chen, Yunji
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.16151
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915725026787328
author Liu, Yang
Chen, Yi
Zhao, Yongwei
Hao, Yifan
Zheng, Zifu
Kong, Weihao
Li, Zhangmai
Jiang, Dongchen
Xia, Ruiyang
Ma, Zhihong
Liu, Zisheng
Wan, Zhaoyong
Lu, Yunqi
Liu, Ximing
Guo, Hongrui
Yang, Zhihao
Wang, Zhe
Ma, Tianrui
Zou, Mo
Zhang, Rui
Li, Ling
Hu, Xing
Du, Zidong
Xu, Zhiwei
Guo, Qi
Chen, Tianshi
Chen, Yunji
author_facet Liu, Yang
Chen, Yi
Zhao, Yongwei
Hao, Yifan
Zheng, Zifu
Kong, Weihao
Li, Zhangmai
Jiang, Dongchen
Xia, Ruiyang
Ma, Zhihong
Liu, Zisheng
Wan, Zhaoyong
Lu, Yunqi
Liu, Ximing
Guo, Hongrui
Yang, Zhihao
Wang, Zhe
Ma, Tianrui
Zou, Mo
Zhang, Rui
Li, Ling
Hu, Xing
Du, Zidong
Xu, Zhiwei
Guo, Qi
Chen, Tianshi
Chen, Yunji
contents The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.
format Preprint
id arxiv_https___arxiv_org_abs_2508_16151
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
Liu, Yang
Chen, Yi
Zhao, Yongwei
Hao, Yifan
Zheng, Zifu
Kong, Weihao
Li, Zhangmai
Jiang, Dongchen
Xia, Ruiyang
Ma, Zhihong
Liu, Zisheng
Wan, Zhaoyong
Lu, Yunqi
Liu, Ximing
Guo, Hongrui
Yang, Zhihao
Wang, Zhe
Ma, Tianrui
Zou, Mo
Zhang, Rui
Li, Ling
Hu, Xing
Du, Zidong
Xu, Zhiwei
Guo, Qi
Chen, Tianshi
Chen, Yunji
Hardware Architecture
Computation and Language
The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.
title Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
topic Hardware Architecture
Computation and Language
url https://arxiv.org/abs/2508.16151