Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Weijian, He, Shuibing, Qu, Haoyang, Zhang, Xuechen
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2409.00657
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912018679726080
author	Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen
author_facet	Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen
contents	Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_00657
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen Distributed, Parallel, and Cluster Computing Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3.
title	HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2409.00657

Similar Items