Saved in:
Bibliographic Details
Main Authors: Tarraga-Moreno, Joaquin, Escudero-Sahuquillo, Jesus, Garcia, Pedro Javier, Quiles, Francisco J.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.20965
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908605101375488
author Tarraga-Moreno, Joaquin
Escudero-Sahuquillo, Jesus
Garcia, Pedro Javier
Quiles, Francisco J.
author_facet Tarraga-Moreno, Joaquin
Escudero-Sahuquillo, Jesus
Garcia, Pedro Javier
Quiles, Francisco J.
contents In the last decade, specific-purpose computing and storage devices, such as GPUs, TPUs, or high-speed storage, have been incorporated into server nodes of Supercomputers and Data centers. The development of high-bandwidth memory (HBM) enabled a much more compact form factor for these devices, thus allowing the interconnection of several of them within a server node, typically using an intra-node interconnection network (e.g., PCIe, NVLink, or Infinity Fabric). These networks allow scaling up the number of specific computing and storage devices per node. Furthermore, the inter-node networks communicate thousands of these devices placed in different server nodes in a Supercomputer or Data Center. Unfortunately, the intra- and inter-node networks may become the system's bottleneck due to the increasing communication demand among accelerators of applications such as generative AI. Although current intra-node network designs alleviate this bottleneck by increasing the bandwidth of the intra-node network, we show in this paper that such a high bandwidth for intra-node communication may hinder the inter-node communication performance when traffic from outside the node arrives at the intra-node devices, resulting in interference with intra-node traffic. To analyze the impact of this interference, we have studied the communication operations of realistic traffic patterns exploiting intra-node communication. We have developed a generic intra- and inter-node simulation model based on OMNeT++ and modeled the mentioned communication operations. We have also performed extensive simulation experiments that confirm that increasing the intra-node network bandwidth and the number of computing devices per node (i.e., accelerators) is counterproductive to the inter-node communication performance.
format Preprint
id arxiv_https___arxiv_org_abs_2502_20965
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
Tarraga-Moreno, Joaquin
Escudero-Sahuquillo, Jesus
Garcia, Pedro Javier
Quiles, Francisco J.
Hardware Architecture
In the last decade, specific-purpose computing and storage devices, such as GPUs, TPUs, or high-speed storage, have been incorporated into server nodes of Supercomputers and Data centers. The development of high-bandwidth memory (HBM) enabled a much more compact form factor for these devices, thus allowing the interconnection of several of them within a server node, typically using an intra-node interconnection network (e.g., PCIe, NVLink, or Infinity Fabric). These networks allow scaling up the number of specific computing and storage devices per node. Furthermore, the inter-node networks communicate thousands of these devices placed in different server nodes in a Supercomputer or Data Center. Unfortunately, the intra- and inter-node networks may become the system's bottleneck due to the increasing communication demand among accelerators of applications such as generative AI. Although current intra-node network designs alleviate this bottleneck by increasing the bandwidth of the intra-node network, we show in this paper that such a high bandwidth for intra-node communication may hinder the inter-node communication performance when traffic from outside the node arrives at the intra-node devices, resulting in interference with intra-node traffic. To analyze the impact of this interference, we have studied the communication operations of realistic traffic patterns exploiting intra-node communication. We have developed a generic intra- and inter-node simulation model based on OMNeT++ and modeled the mentioned communication operations. We have also performed extensive simulation experiments that confirm that increasing the intra-node network bandwidth and the number of computing devices per node (i.e., accelerators) is counterproductive to the inter-node communication performance.
title On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
topic Hardware Architecture
url https://arxiv.org/abs/2502.20965