Saved in:
Bibliographic Details
Main Authors: Chen, Weijian, He, Shuibing, Qu, Haoyang, Zhang, Xuechen
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.00657
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912018679726080
author Chen, Weijian
He, Shuibing
Qu, Haoyang
Zhang, Xuechen
author_facet Chen, Weijian
He, Shuibing
Qu, Haoyang
Zhang, Xuechen
contents Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3.
format Preprint
id arxiv_https___arxiv_org_abs_2409_00657
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
Chen, Weijian
He, Shuibing
Qu, Haoyang
Zhang, Xuechen
Distributed, Parallel, and Cluster Computing
Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3.
title HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
topic Distributed, Parallel, and Cluster Computing
url https://arxiv.org/abs/2409.00657