Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guha, Etash, Jiang, Tianxiao, Deng, Andrew, Zhang, Jian, Annamalai, Muthu
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Machine Learning Programming Languages
Online Access:	https://arxiv.org/abs/2511.01872
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917057743814656
author	Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu
author_facet	Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu
contents	Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_01872
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Learned Cost Model for Placement on Reconfigurable Dataflow Hardware Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu Distributed, Parallel, and Cluster Computing Machine Learning Programming Languages Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.
title	Learned Cost Model for Placement on Reconfigurable Dataflow Hardware
topic	Distributed, Parallel, and Cluster Computing Machine Learning Programming Languages
url	https://arxiv.org/abs/2511.01872

Similar Items