Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.01872 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917057743814656 |
|---|---|
| author | Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu |
| author_facet | Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu |
| contents | Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_01872 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Learned Cost Model for Placement on Reconfigurable Dataflow Hardware Guha, Etash Jiang, Tianxiao Deng, Andrew Zhang, Jian Annamalai, Muthu Distributed, Parallel, and Cluster Computing Machine Learning Programming Languages Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs. |
| title | Learned Cost Model for Placement on Reconfigurable Dataflow Hardware |
| topic | Distributed, Parallel, and Cluster Computing Machine Learning Programming Languages |
| url | https://arxiv.org/abs/2511.01872 |