Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhao, Liang, Shao, Kunming, Liao, Zhipeng, Huang, Xijie, Cheng, Tim Kwang-Ting, Tsui, Chi-Ying, Zou, Yi
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2602.05743
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911693532037120
author	Zhao, Liang Shao, Kunming Liao, Zhipeng Huang, Xijie Cheng, Tim Kwang-Ting Tsui, Chi-Ying Zou, Yi
author_facet	Zhao, Liang Shao, Kunming Liao, Zhipeng Huang, Xijie Cheng, Tim Kwang-Ting Tsui, Chi-Ying Zou, Yi
contents	FP8 low-precision formats have gained significant adoption in Transformer inference and training. However, existing digital compute-in-memory (DCIM) architectures face challenges in supporting variable FP8 aligned-mantissa bitwidths, as unified alignment strategies and fixed-precision multiply-accumulate (MAC) units struggle to handle input data with diverse distributions. This work presents a flexible FP8 DCIM accelerator with three innovations: (1) a dynamic shift-aware bitwidth prediction (DSBP) with on-the-fly input prediction that adaptively adjusts weight (2/4/6/8b) and input (2$\sim$12b) aligned-mantissa precision; (2) a FIFO-based input alignment unit (FIAU) replacing complex barrel shifters with pointer-based control; and (3) a precision-scalable INT MAC array achieving flexible weight precision with minimal overhead. Implemented in 28nm CMOS with a 64$\times$96 CIM array, the design achieves 20.4 TFLOPS/W for fixed E5M7, demonstrating 2.8$\times$ higher FP8 efficiency than previous work while supporting all FP8 formats. Results on Llama-7b show that the DSBP achieves higher efficiency than fixed bitwidth mode at the same accuracy level on both BoolQ and Winogrande datasets, with configurable parameters enabling flexible accuracy-efficiency trade-offs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_05743
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction Zhao, Liang Shao, Kunming Liao, Zhipeng Huang, Xijie Cheng, Tim Kwang-Ting Tsui, Chi-Ying Zou, Yi Hardware Architecture FP8 low-precision formats have gained significant adoption in Transformer inference and training. However, existing digital compute-in-memory (DCIM) architectures face challenges in supporting variable FP8 aligned-mantissa bitwidths, as unified alignment strategies and fixed-precision multiply-accumulate (MAC) units struggle to handle input data with diverse distributions. This work presents a flexible FP8 DCIM accelerator with three innovations: (1) a dynamic shift-aware bitwidth prediction (DSBP) with on-the-fly input prediction that adaptively adjusts weight (2/4/6/8b) and input (2$\sim$12b) aligned-mantissa precision; (2) a FIFO-based input alignment unit (FIAU) replacing complex barrel shifters with pointer-based control; and (3) a precision-scalable INT MAC array achieving flexible weight precision with minimal overhead. Implemented in 28nm CMOS with a 64$\times$96 CIM array, the design achieves 20.4 TFLOPS/W for fixed E5M7, demonstrating 2.8$\times$ higher FP8 efficiency than previous work while supporting all FP8 formats. Results on Llama-7b show that the DSBP achieves higher efficiency than fixed bitwidth mode at the same accuracy level on both BoolQ and Winogrande datasets, with configurable parameters enabling flexible accuracy-efficiency trade-offs.
title	Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction
topic	Hardware Architecture
url	https://arxiv.org/abs/2602.05743

Similar Items