Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.09497 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909169793105920 |
|---|---|
| author | Duan, Cenlin Yang, Jianlei Wang, Yiou Wang, Yikun Qi, Yingjie He, Xiaolin Yan, Bonan Wang, Xueyan Jia, Xiaotao Zhao, Weisheng |
| author_facet | Duan, Cenlin Yang, Jianlei Wang, Yiou Wang, Yikun Qi, Yingjie He, Xiaolin Yan, Bonan Wang, Xueyan Jia, Xiaotao Zhao, Weisheng |
| contents | Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2404_09497 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity Duan, Cenlin Yang, Jianlei Wang, Yiou Wang, Yikun Qi, Yingjie He, Xiaolin Yan, Bonan Wang, Xueyan Jia, Xiaotao Zhao, Weisheng Hardware Architecture Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%. |
| title | Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity |
| topic | Hardware Architecture |
| url | https://arxiv.org/abs/2404.09497 |