Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Mukhodopadhyay, Ujjaini, Tripathy, Alok, Selvitopi, Oguz, Yelick, Katherine, Buluc, Aydin
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2504.04673
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912312954191872
author	Mukhodopadhyay, Ujjaini Tripathy, Alok Selvitopi, Oguz Yelick, Katherine Buluc, Aydin
author_facet	Mukhodopadhyay, Ujjaini Tripathy, Alok Selvitopi, Oguz Yelick, Katherine Buluc, Aydin
contents	Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability. Sparse-matrix dense-matrix multiplication (SpMM) is the core computational operation in full-graph training of GNNs. Previous work parallelizing this operation focused on sparsity-oblivious algorithms, where matrix elements are communicated regardless of the sparsity pattern. This leads to a predictable communication pattern that can be overlapped with computation and enables the use of collective communication operations at the expense of wasting significant bandwidth by communicating unnecessary data. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches. First, we communicate only the necessary matrix elements. Second, we utilize a graph partitioning model to reorder the matrix and drastically reduce the amount of communicated elements. Finally, we address the high load imbalance in communication with a tailored partitioning model, which minimizes both the total communication volume and the maximum sending volume. We further couple these sparsity-exploiting approaches with a communication-avoiding approach (1.5D parallel SpMM) in which submatrices are replicated to reduce communication. We explore the tradeoffs of these combined optimizations and show up to 14X improvement on 256 GPUs and on some instances reducing communication to almost zero resulting in a communication-free parallel training relative to a popular GNN framework based on communication-oblivious SpMM.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_04673
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Sparsity-Aware Communication for Distributed Graph Neural Network Training Mukhodopadhyay, Ujjaini Tripathy, Alok Selvitopi, Oguz Yelick, Katherine Buluc, Aydin Machine Learning Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability. Sparse-matrix dense-matrix multiplication (SpMM) is the core computational operation in full-graph training of GNNs. Previous work parallelizing this operation focused on sparsity-oblivious algorithms, where matrix elements are communicated regardless of the sparsity pattern. This leads to a predictable communication pattern that can be overlapped with computation and enables the use of collective communication operations at the expense of wasting significant bandwidth by communicating unnecessary data. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches. First, we communicate only the necessary matrix elements. Second, we utilize a graph partitioning model to reorder the matrix and drastically reduce the amount of communicated elements. Finally, we address the high load imbalance in communication with a tailored partitioning model, which minimizes both the total communication volume and the maximum sending volume. We further couple these sparsity-exploiting approaches with a communication-avoiding approach (1.5D parallel SpMM) in which submatrices are replicated to reduce communication. We explore the tradeoffs of these combined optimizations and show up to 14X improvement on 256 GPUs and on some instances reducing communication to almost zero resulting in a communication-free parallel training relative to a popular GNN framework based on communication-oblivious SpMM.
title	Sparsity-Aware Communication for Distributed Graph Neural Network Training
topic	Machine Learning
url	https://arxiv.org/abs/2504.04673

Similar Items