Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sui, Bingcai, Shen, Junzhong, Sun, Caixia, Wang, Junhui, Zheng, Zhong, Guo, Wei
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2404.19180
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913335893557248
author	Sui, Bingcai Shen, Junzhong Sun, Caixia Wang, Junhui Zheng, Zhong Guo, Wei
author_facet	Sui, Bingcai Shen, Junzhong Sun, Caixia Wang, Junhui Zheng, Zhong Guo, Wei
contents	General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled multi-core general-purpose architecture optimized for GEMM-related applications. To enhance the programmability and flexibility of MACO, the paper introduces a tile-based instruction set architecture. Additionally, the paper presents techniques such as hardware-assisted data prefetching and locking, and predictive address translation to further enhance the computational efficiency of MACO for GEMM workloads. The experimental results demonstrate that MACO exhibits good scalability, achieving an average computational efficiency of 90% across multiple cores. Furthermore, evaluations on state-of-the-art deep neural networks show that MACO can achieve up to 1.1 TFLOPS with 88% computational efficiency, indicating its adaptivity to deep learning workloads.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_19180
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor Sui, Bingcai Shen, Junzhong Sun, Caixia Wang, Junhui Zheng, Zhong Guo, Wei Hardware Architecture General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled multi-core general-purpose architecture optimized for GEMM-related applications. To enhance the programmability and flexibility of MACO, the paper introduces a tile-based instruction set architecture. Additionally, the paper presents techniques such as hardware-assisted data prefetching and locking, and predictive address translation to further enhance the computational efficiency of MACO for GEMM workloads. The experimental results demonstrate that MACO exhibits good scalability, achieving an average computational efficiency of 90% across multiple cores. Furthermore, evaluations on state-of-the-art deep neural networks show that MACO can achieve up to 1.1 TFLOPS with 88% computational efficiency, indicating its adaptivity to deep learning workloads.
title	MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor
topic	Hardware Architecture
url	https://arxiv.org/abs/2404.19180

Similar Items