Saved in:
Bibliographic Details
Main Authors: Khizbullin, Dmitrii, de Andrade, Eduardo Rocha, Nguyen, Thanh Hau, Ferreira, Matheus Pedroza, Pugh, David R.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.16623
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929602998304768
author Khizbullin, Dmitrii
de Andrade, Eduardo Rocha
Nguyen, Thanh Hau
Ferreira, Matheus Pedroza
Pugh, David R.
author_facet Khizbullin, Dmitrii
de Andrade, Eduardo Rocha
Nguyen, Thanh Hau
Ferreira, Matheus Pedroza
Pugh, David R.
contents With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
format Preprint
id arxiv_https___arxiv_org_abs_2405_16623
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Graph neural networks with configuration cross-attention for tensor compilers
Khizbullin, Dmitrii
de Andrade, Eduardo Rocha
Nguyen, Thanh Hau
Ferreira, Matheus Pedroza
Pugh, David R.
Machine Learning
Hardware Architecture
Performance
With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $τ$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
title Graph neural networks with configuration cross-attention for tensor compilers
topic Machine Learning
Hardware Architecture
Performance
url https://arxiv.org/abs/2405.16623