Saved in:
Bibliographic Details
Main Authors: Ni, Xiaobing, Ge, Mengke, Ruan, Jiaheng, Chen, Song, Kang, Yi
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.11021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910745762988032
author Ni, Xiaobing
Ge, Mengke
Ruan, Jiaheng
Chen, Song
Kang, Yi
author_facet Ni, Xiaobing
Ge, Mengke
Ruan, Jiaheng
Chen, Song
Kang, Yi
contents Streaming coarse-grained reconfgurable array (CGRA) is a promising architecture for data/computing-intensive applications because of its fexibility, high throughput and efcient memory system. However,when accelerating sparse CNNs, the irregular input data demands inside sparse CNNs would cause excessive caching operations (COPs) and multi-cycle internal dependencies (MCIDs) between operations, declining the throughput of the streaming CGRA. We propose a mapping method for sparse CNNs onto streaming CGRA, SparseMap, which incorporates an efcient I/O data management along with operation scheduling and binding, to reduce the COPs and MCIDs, thereby ensuring the optimal throughput of streaming CGRA.The experimental results show SparseMap reduces 92.5% COPs and 46.0 % MCIDs while achieves the same or even smaller initiation interval (II) compared to previous works.
format Preprint
id arxiv_https___arxiv_org_abs_2412_11021
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SparseMap: Loop Mapping for Sparse CNNs on Streaming Coarse-grained Reconfigurable Array
Ni, Xiaobing
Ge, Mengke
Ruan, Jiaheng
Chen, Song
Kang, Yi
Distributed, Parallel, and Cluster Computing
Streaming coarse-grained reconfgurable array (CGRA) is a promising architecture for data/computing-intensive applications because of its fexibility, high throughput and efcient memory system. However,when accelerating sparse CNNs, the irregular input data demands inside sparse CNNs would cause excessive caching operations (COPs) and multi-cycle internal dependencies (MCIDs) between operations, declining the throughput of the streaming CGRA. We propose a mapping method for sparse CNNs onto streaming CGRA, SparseMap, which incorporates an efcient I/O data management along with operation scheduling and binding, to reduce the COPs and MCIDs, thereby ensuring the optimal throughput of streaming CGRA.The experimental results show SparseMap reduces 92.5% COPs and 46.0 % MCIDs while achieves the same or even smaller initiation interval (II) compared to previous works.
title SparseMap: Loop Mapping for Sparse CNNs on Streaming Coarse-grained Reconfigurable Array
topic Distributed, Parallel, and Cluster Computing
url https://arxiv.org/abs/2412.11021