Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Raman, Siddhartha Raman Sundara, Kulkarni, Jaydeep P.
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2602.14262
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.

Similar Items