Saved in:
Bibliographic Details
Main Authors: Yang, Yu, González, Jordi Altayó, Delestrac, Paul, Hemani, Ahmed
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.00207
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915260403810304
author Yang, Yu
González, Jordi Altayó
Delestrac, Paul
Hemani, Ahmed
author_facet Yang, Yu
González, Jordi Altayó
Delestrac, Paul
Hemani, Ahmed
contents The enhanced efficiency of hardware accelerators, including Single Instruction Multiple Data (SIMD) architectures and Coarse-Grained Reconfigurable Architectures (CGRAs), is driving significant advancements in Artificial Intelligence and Machine Learning (AI/ML) applications. These applications frequently involve data streaming operations comprised of numerous vector calculations inherently amenable to parallelization. However, despite considerable progress in hardware accelerator design, their potential remains constrained by conventional instruction set architectures (ISAs). Traditional ISAs, primarily designed for microprocessors and accelerators, emphasize computation while often neglecting instruction composability and inter-instruction cooperation. This limitation results in rigid ISAs that are difficult to extend and suffer from large control overhead in their hardware implementations. To address this, we present a novel composable instruction set (CIS) architecture, designed with both spatial and temporal composability, making it well-suited for data streaming applications. The proposed CIS utilizes a small instruction set, yet efficiently implements complex, multi-level loop structures essential for accelerating data streaming workloads. Furthermore, CIS adopts a resource-centric approach, facilitating straightforward extension through the integration of new hardware resources, enabling the creation of custom, heterogeneous computing platforms. Our results comparing performance between the proposed CIS and other state-of-the-art ISAs demonstrate that a CIS-based architecture significantly outperforms existing solutions, achieving near-optimal processing element (PE) utilization.
format Preprint
id arxiv_https___arxiv_org_abs_2407_00207
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle CIS: Composable Instruction Set for Data Streaming Applications
Yang, Yu
González, Jordi Altayó
Delestrac, Paul
Hemani, Ahmed
Hardware Architecture
The enhanced efficiency of hardware accelerators, including Single Instruction Multiple Data (SIMD) architectures and Coarse-Grained Reconfigurable Architectures (CGRAs), is driving significant advancements in Artificial Intelligence and Machine Learning (AI/ML) applications. These applications frequently involve data streaming operations comprised of numerous vector calculations inherently amenable to parallelization. However, despite considerable progress in hardware accelerator design, their potential remains constrained by conventional instruction set architectures (ISAs). Traditional ISAs, primarily designed for microprocessors and accelerators, emphasize computation while often neglecting instruction composability and inter-instruction cooperation. This limitation results in rigid ISAs that are difficult to extend and suffer from large control overhead in their hardware implementations. To address this, we present a novel composable instruction set (CIS) architecture, designed with both spatial and temporal composability, making it well-suited for data streaming applications. The proposed CIS utilizes a small instruction set, yet efficiently implements complex, multi-level loop structures essential for accelerating data streaming workloads. Furthermore, CIS adopts a resource-centric approach, facilitating straightforward extension through the integration of new hardware resources, enabling the creation of custom, heterogeneous computing platforms. Our results comparing performance between the proposed CIS and other state-of-the-art ISAs demonstrate that a CIS-based architecture significantly outperforms existing solutions, achieving near-optimal processing element (PE) utilization.
title CIS: Composable Instruction Set for Data Streaming Applications
topic Hardware Architecture
url https://arxiv.org/abs/2407.00207