Saved in:
Bibliographic Details
Main Authors: Guha, Etash, Jiang, Tianxiao, Deng, Andrew, Zhang, Jian, Annamalai, Muthu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.01872
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917057743814656
author Guha, Etash
Jiang, Tianxiao
Deng, Andrew
Zhang, Jian
Annamalai, Muthu
author_facet Guha, Etash
Jiang, Tianxiao
Deng, Andrew
Zhang, Jian
Annamalai, Muthu
contents Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.
format Preprint
id arxiv_https___arxiv_org_abs_2511_01872
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Learned Cost Model for Placement on Reconfigurable Dataflow Hardware
Guha, Etash
Jiang, Tianxiao
Deng, Andrew
Zhang, Jian
Annamalai, Muthu
Distributed, Parallel, and Cluster Computing
Machine Learning
Programming Languages
Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.
title Learned Cost Model for Placement on Reconfigurable Dataflow Hardware
topic Distributed, Parallel, and Cluster Computing
Machine Learning
Programming Languages
url https://arxiv.org/abs/2511.01872