:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Chao, Zhang, Xu, Luo, Zihang, Wu, Yuyan, Qian, Guoxin, Yao, Yufeng, Wang, Chihyung, Zhou, Jingbin
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2605.22428
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
by: Huang, Jiajun, et al.
Published: (2023)

FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion
by: Zhu, Zhuoran, et al.
Published: (2025)

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling
by: Zhang, Xinyi, et al.
Published: (2024)

Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)

An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)

Air-FedGA: A Grouping Asynchronous Federated Learning Mechanism Exploiting Over-the-air Computation
by: Ma, Qianpiao, et al.
Published: (2025)

EPIC: Abstraction and Polymorphism of In-Network Collectives on Ethernet
by: Yuan, Yitao, et al.
Published: (2026)

Cross-region Model Training with Communication-Computation Overlapping and Delay Compensation
by: Zhu, Ying, et al.
Published: (2025)

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)

Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis
by: Li, Yuetai, et al.
Published: (2025)

On-the-fly Communication-and-Computing to Enable Representation Learning for Distributed Point Clouds
by: Chen, Xu, et al.
Published: (2024)

ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy Compression
by: Huang, Jiajun, et al.
Published: (2025)

Accelerating Compound LLM Training Workloads with Maestro
by: Yuan, Xiulong, et al.
Published: (2026)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

Generic Multicast (Extended Version)
by: Bolina, José Augusto, et al.
Published: (2024)

AES-SpMM: Balancing Accuracy and Speed by Adaptive Edge Sampling Strategy to Accelerate SpMM in GNNs
by: Song, Yingchen, et al.
Published: (2025)

GPU-Accelerated Distributed QAOA on Large-scale HPC Ecosystems
by: Xu, Zhihao, et al.
Published: (2025)

Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
by: Deng, Yangtao, et al.
Published: (2025)

Prime Collective Communications Library -- Technical Report
by: Keiblinger, Michael, et al.
Published: (2025)

ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
by: Lin, Wenxiang, et al.
Published: (2026)

Hyperion: Hierarchical Scheduling for Parallel LLM Acceleration in Multi-tier Networks
by: Ma, Mulei, et al.
Published: (2025)

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend
by: Hu, Tianlun, et al.
Published: (2026)

Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024)

HiCCL: A Hierarchical Collective Communication Library
by: Hidayetoglu, Mert, et al.
Published: (2024)

Union: An Automatic Workload Manager for Accelerating Network Simulation
by: Wang, Xin, et al.
Published: (2024)

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
by: Liu, Zhibang, et al.
Published: (2025)

A HPX Communication Benchmark: Distributed FFT using Collectives
by: Strack, Alexander, et al.
Published: (2025)

Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition
by: Wang, Hansheng, et al.
Published: (2024)

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs
by: Chen, Aodong, et al.
Published: (2023)

Communication-Efficient Distributed Learning with Local Immediate Error Compensation
by: Cheng, Yifei, et al.
Published: (2024)

exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)

A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures
by: Sai, Ryuichi, et al.
Published: (2023)

Accelerating OpenPangu Inference on NPU via Speculative Decoding
by: Dai, Yuntao, et al.
Published: (2026)

RcLLM: Accelerating Generative Recommendation via Beyond-Prefix KV Caching
by: Zhao, Zhan, et al.
Published: (2026)

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training
by: Tan, Xin, et al.
Published: (2025)

FlowMesh: A Service Fabric for Composable LLM Workflows
by: Shen, Junyi, et al.
Published: (2025)

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
by: Fang, Jiahao, et al.
Published: (2024)

Exploiting Stragglers in Distributed Computing Systems with Task Grouping
by: Adikari, Tharindu, et al.
Published: (2024)