Enregistré dans:
Détails bibliographiques
Auteurs principaux: Langhammer, Martin, Constantinides, George A.
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2401.04261
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866917562185416704
author Langhammer, Martin
Constantinides, George A.
author_facet Langhammer, Martin
Constantinides, George A.
contents Current soft processor architectures for FPGAs do not utilize the potential of the massive parallelism available. FPGAs now support many thousands of embedded floating point operators, and have similar computational densities to GPGPUs. Several soft GPGPU or SIMT processors have been published, but the reported large areas and modest Fmax makes their widespread use unlikely for commercial designs. In this paper we take an alternative approach, building the soft GPU microarchitecture around the FPGA resource mix available. We demonstrate a statically scalable soft GPGPU processor (where both parameters and feature set can be determined at configuration time) that always closes timing at the peak speed of the slowest embedded component in the FPGA (DSP or hard memory), with a completely unconstrained compile into a current Intel Agilex FPGA. We also show dynamic scalability, where a subset of the thread space can be specified on an instruction-by-instruction basis. For one example core type, we show a logic range -- depending on the configuration -- of 4k to 10k ALMs, along with 24 to 32 DSP Blocks, and 50 to 250 M20K memories. All of these instances close timing at 771 MHz, a performance level limited only by the DSP Blocks. We describe our methodology for reliably achieving this clock rate by matching the processor pipeline structure to the physical structure of the FPGA fabric. We also benchmark several algorithms across a range of data sizes, and compare to a commercial soft RISC processor.
format Preprint
id arxiv_https___arxiv_org_abs_2401_04261
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle A Statically and Dynamically Scalable Soft GPGPU
Langhammer, Martin
Constantinides, George A.
Hardware Architecture
Current soft processor architectures for FPGAs do not utilize the potential of the massive parallelism available. FPGAs now support many thousands of embedded floating point operators, and have similar computational densities to GPGPUs. Several soft GPGPU or SIMT processors have been published, but the reported large areas and modest Fmax makes their widespread use unlikely for commercial designs. In this paper we take an alternative approach, building the soft GPU microarchitecture around the FPGA resource mix available. We demonstrate a statically scalable soft GPGPU processor (where both parameters and feature set can be determined at configuration time) that always closes timing at the peak speed of the slowest embedded component in the FPGA (DSP or hard memory), with a completely unconstrained compile into a current Intel Agilex FPGA. We also show dynamic scalability, where a subset of the thread space can be specified on an instruction-by-instruction basis. For one example core type, we show a logic range -- depending on the configuration -- of 4k to 10k ALMs, along with 24 to 32 DSP Blocks, and 50 to 250 M20K memories. All of these instances close timing at 771 MHz, a performance level limited only by the DSP Blocks. We describe our methodology for reliably achieving this clock rate by matching the processor pipeline structure to the physical structure of the FPGA fabric. We also benchmark several algorithms across a range of data sizes, and compare to a commercial soft RISC processor.
title A Statically and Dynamically Scalable Soft GPGPU
topic Hardware Architecture
url https://arxiv.org/abs/2401.04261