Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.00657 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912018679726080 |
|---|---|
| author | Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen |
| author_facet | Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen |
| contents | Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2409_00657 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration Chen, Weijian He, Shuibing Qu, Haoyang Zhang, Xuechen Distributed, Parallel, and Cluster Computing Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3. |
| title | HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration |
| topic | Distributed, Parallel, and Cluster Computing |
| url | https://arxiv.org/abs/2409.00657 |