MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Liu, Hangda, Diao, Boyu, Yang, Yu, Chen, Wenxin, Peng, Xiaohui, Xu, Yongjun
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Distributed, Parallel, and Cluster Computing
Accesso online:	https://arxiv.org/abs/2502.11407
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913692707192832
author	Liu, Hangda Diao, Boyu Yang, Yu Chen, Wenxin Peng, Xiaohui Xu, Yongjun
author_facet	Liu, Hangda Diao, Boyu Yang, Yu Chen, Wenxin Peng, Xiaohui Xu, Yongjun
contents	High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs. However, how to generate kernels with higher performance in a shorter time is still the key challenge. In this paper, we present Gensor, a graph-based construction tensor compilation method for deep learning, to further improve the performance of construction tensor compilation. Unlike existing tree-based methods, Gensor abstracts construction space into a graph structure. Gensor then explores the construction space with Markov analysis. Gensor takes tensor programs as states and models scheduling primitives as transition actions between these states. Therefore, the process of tensor program construction optimization is abstracted as a graph traversal process. This approach expands the optimization space, improving operator performance while ensuring rapid optimization. Extensive experiments with typical operators demonstrate that Gensor significantly outperforms the state-of-the-art methods on GPUs for both cloud servers and edge devices. As a result, Gensor can generate operator kernels in seconds, with performance increasing by 18\% on average, reaching a maximum of 30\%. It also achieves high speedup for end-to-end models like ResNet-50 and GPT-2, with an average acceleration of 20\%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_11407
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning Liu, Hangda Diao, Boyu Yang, Yu Chen, Wenxin Peng, Xiaohui Xu, Yongjun Distributed, Parallel, and Cluster Computing High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs. However, how to generate kernels with higher performance in a shorter time is still the key challenge. In this paper, we present Gensor, a graph-based construction tensor compilation method for deep learning, to further improve the performance of construction tensor compilation. Unlike existing tree-based methods, Gensor abstracts construction space into a graph structure. Gensor then explores the construction space with Markov analysis. Gensor takes tensor programs as states and models scheduling primitives as transition actions between these states. Therefore, the process of tensor program construction optimization is abstracted as a graph traversal process. This approach expands the optimization space, improving operator performance while ensuring rapid optimization. Extensive experiments with typical operators demonstrate that Gensor significantly outperforms the state-of-the-art methods on GPUs for both cloud servers and edge devices. As a result, Gensor can generate operator kernels in seconds, with performance increasing by 18\% on average, reaching a maximum of 30\%. It also achieves high speedup for end-to-end models like ResNet-50 and GPT-2, with an average acceleration of 20\%.
title	Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2502.11407

Documenti analoghi