Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Khizbullin, Dmitrii, de Andrade, Eduardo Rocha, Nguyen, Thanh Hau, Ferreira, Matheus Pedroza, Pugh, David R.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Hardware Architecture Performance
Online Access:	https://arxiv.org/abs/2405.16623
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929602998304768
author	Khizbullin, Dmitrii de Andrade, Eduardo Rocha Nguyen, Thanh Hau Ferreira, Matheus Pedroza Pugh, David R.
author_facet	Khizbullin, Dmitrii de Andrade, Eduardo Rocha Nguyen, Thanh Hau Ferreira, Matheus Pedroza Pugh, David R.
contents	With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_16623
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Graph neural networks with configuration cross-attention for tensor compilers Khizbullin, Dmitrii de Andrade, Eduardo Rocha Nguyen, Thanh Hau Ferreira, Matheus Pedroza Pugh, David R. Machine Learning Hardware Architecture Performance With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
title	Graph neural networks with configuration cross-attention for tensor compilers
topic	Machine Learning Hardware Architecture Performance
url	https://arxiv.org/abs/2405.16623

Similar Items