Saved in:
Bibliographic Details
Main Authors: Bi, Chongke, Gao, Xin, Fu, Baofeng, Zhao, Yuheng, Chen, Siming, Zhao, Ying, Yang, Lu
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.23257
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912532219822080
author Bi, Chongke
Gao, Xin
Fu, Baofeng
Zhao, Yuheng
Chen, Siming
Zhao, Ying
Yang, Lu
author_facet Bi, Chongke
Gao, Xin
Fu, Baofeng
Zhao, Yuheng
Chen, Siming
Zhao, Ying
Yang, Lu
contents Large-scale simulations on supercomputers have become important tools for users. However, their scalability remains a problem due to the huge communication cost among parallel processes. Most of the existing communication latency analysis methods rely on the physical link layer information, which is only available to administrators. In this paper, a framework called PCLVis is proposed to help general users analyze process communication latency (PCL) events. Instead of the physical link layer information, the PCLVis uses the MPI process communication data for the analysis. First, a spatial PCL event locating method is developed. All processes with high correlation are classified into a single cluster by constructing a process-correlation tree. Second, the propagation path of PCL events is analyzed by constructing a communication-dependency-based directed acyclic graph (DAG), which can help users interactively explore a PCL event from the temporal evolution of a located PCL event cluster. In this graph, a sliding window algorithm is designed to generate the PCL events abstraction. Meanwhile, a new glyph called the communication state glyph (CS-Glyph) is designed for each process to show its communication states, including its in/out messages and load balance. Each leaf node can be further unfolded to view additional information. Third, a PCL event attribution strategy is formulated to help users optimize their simulations. The effectiveness of the PCLVis framework is demonstrated by analyzing the PCL events of several simulations running on the TH-1A supercomputer. By using the proposed framework, users can greatly improve the efficiency of their simulations.
format Preprint
id arxiv_https___arxiv_org_abs_2506_23257
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PCLVis: Visual Analytics of Process Communication Latency in Large-Scale Simulation
Bi, Chongke
Gao, Xin
Fu, Baofeng
Zhao, Yuheng
Chen, Siming
Zhao, Ying
Yang, Lu
Computer Vision and Pattern Recognition
Large-scale simulations on supercomputers have become important tools for users. However, their scalability remains a problem due to the huge communication cost among parallel processes. Most of the existing communication latency analysis methods rely on the physical link layer information, which is only available to administrators. In this paper, a framework called PCLVis is proposed to help general users analyze process communication latency (PCL) events. Instead of the physical link layer information, the PCLVis uses the MPI process communication data for the analysis. First, a spatial PCL event locating method is developed. All processes with high correlation are classified into a single cluster by constructing a process-correlation tree. Second, the propagation path of PCL events is analyzed by constructing a communication-dependency-based directed acyclic graph (DAG), which can help users interactively explore a PCL event from the temporal evolution of a located PCL event cluster. In this graph, a sliding window algorithm is designed to generate the PCL events abstraction. Meanwhile, a new glyph called the communication state glyph (CS-Glyph) is designed for each process to show its communication states, including its in/out messages and load balance. Each leaf node can be further unfolded to view additional information. Third, a PCL event attribution strategy is formulated to help users optimize their simulations. The effectiveness of the PCLVis framework is demonstrated by analyzing the PCL events of several simulations running on the TH-1A supercomputer. By using the proposed framework, users can greatly improve the efficiency of their simulations.
title PCLVis: Visual Analytics of Process Communication Latency in Large-Scale Simulation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2506.23257