Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Peipei, Guan, Wu, Liang, Liping, Wang, Zhijun, Luo, Hanqing, Zhang, Zhibin
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2507.14139
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

This paper introduces SpeedLLM, a neural network accelerator designed on the Xilinx Alevo U280 platform and optimized for the Tinyllama framework to enhance edge computing performance. Key innovations include data stream parallelism, a memory reuse strategy, and Llama2 operator fusion, which collectively reduce latency and energy consumption. SpeedLLM's data pipeline architecture optimizes the read-compute-write cycle, while the memory strategy minimizes FPGA resource demands. The operator fusion boosts computational density and throughput. Results show SpeedLLM outperforms traditional Tinyllama implementations, achieving up to 4.8* faster performance and 1.18* lower energy consumption, offering improvements in edge devices.

Similar Items