Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shin, Yongwon, Kang, Dookyung, Sung, Hyojin
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Hardware Architecture
Online-Zugang:	https://arxiv.org/abs/2412.19630
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866910979102605312
author	Shin, Yongwon Kang, Dookyung Sung, Hyojin
author_facet	Shin, Yongwon Kang, Dookyung Sung, Hyojin
contents	Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to 6.18$\times$ for various UPMEM benchmark kernels and 8.21$\times$ for GPT-J layers. To the best of our knowledge, ATiM is the first tensor compiler to provide fully automated, autotuning-integrated code generation support for a DRAM-PIM system. By bridging the gap between high-level tensor computation abstractions and low-level hardware-specific requirements, ATiM establishes a foundation for advancing DRAM-PIM programmability and enabling streamlined optimization.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_19630
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ATiM: Autotuning Tensor Programs for Processing-in-DRAM Shin, Yongwon Kang, Dookyung Sung, Hyojin Hardware Architecture Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to 6.18$\times$ for various UPMEM benchmark kernels and 8.21$\times$ for GPT-J layers. To the best of our knowledge, ATiM is the first tensor compiler to provide fully automated, autotuning-integrated code generation support for a DRAM-PIM system. By bridging the gap between high-level tensor computation abstractions and low-level hardware-specific requirements, ATiM establishes a foundation for advancing DRAM-PIM programmability and enabling streamlined optimization.
title	ATiM: Autotuning Tensor Programs for Processing-in-DRAM
topic	Hardware Architecture
url	https://arxiv.org/abs/2412.19630

Ähnliche Einträge