:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Jun-Liang, Madduri, Kamesh, Kandemir, Mahmut Taylan
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.16715
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
by: Jeon, Byungsoo, et al.
Published: (2024)

TorchGT: A Holistic System for Large-scale Graph Transformer Training
by: Zhang, Meng, et al.
Published: (2024)

Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training
by: Ranjan, Aditya K., et al.
Published: (2025)

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
by: Wan, Xinyi, et al.
Published: (2025)

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
by: Wei, Cunyang, et al.
Published: (2026)

Context Parallelism for Scalable Million-Token Inference
by: Yang, Amy, et al.
Published: (2024)

Parallel-friendly Spatio-Temporal Graph Learning for Photovoltaic Degradation Analysis at Scale
by: Fan, Yangxin, et al.
Published: (2024)

Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training
by: Liu, Guanliang, et al.
Published: (2026)

BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training
by: Wu, Houming, et al.
Published: (2024)

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
by: Guo, Yucheng, et al.
Published: (2026)

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
by: Chen, Yanxi, et al.
Published: (2023)

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
by: Dash, Sajal, et al.
Published: (2026)

TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
by: Wu, Houming, et al.
Published: (2025)

Laminar: A Scalable Asynchronous RL Post-Training Framework
by: Sheng, Guangming, et al.
Published: (2025)

Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction
by: Kralj, Ivan, et al.
Published: (2024)

GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs
by: Jin, Yue, et al.
Published: (2025)

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
by: Polisetty, Sandeep, et al.
Published: (2023)

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks
by: Shi, Ziji, et al.
Published: (2023)

A Parallel Alternative for Energy-Efficient Neural Network Training and Inferencing
by: Seal, Sudip K., et al.
Published: (2025)

Zero Bubble Pipeline Parallelism
by: Qi, Penghui, et al.
Published: (2023)

DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism
by: Zeng, Zhichen, et al.
Published: (2026)

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
by: Hu, Qinghao, et al.
Published: (2025)

Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNs
by: Kralj, Ivan, et al.
Published: (2025)

Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs
by: Wu, Xiabao, et al.
Published: (2025)

GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable
by: Wangni, Jianqiao
Published: (2025)

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
by: Singh, Siddharth, et al.
Published: (2025)

Jet: Multilevel Graph Partitioning on Graphics Processing Units
by: Gilbert, Michael S., et al.
Published: (2023)

SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025)

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
by: Choukroun, Yoni, et al.
Published: (2024)

Federated Graph Learning with Structure Proxy Alignment
by: Fu, Xingbo, et al.
Published: (2024)

Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer
by: Vooturi, Dharma Teja, et al.
Published: (2026)

Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
by: Tomczak, Nathaniel, et al.
Published: (2025)

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
by: Yao, Jinghan, et al.
Published: (2024)

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
by: Zuo, Jingwei, et al.
Published: (2026)

A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
by: Singh, Siddharth, et al.
Published: (2023)

Parallel Split Learning with Global Sampling
by: Kohankhaki, Mohammad, et al.
Published: (2024)

SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective Training
by: Islam, Md Sirajul, et al.
Published: (2025)

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)

Sampling Parallelism for Fast and Efficient Bayesian Learning
by: Özdemir, Asena Karolin, et al.
Published: (2026)

Gradient Correction in Federated Learning with Adaptive Optimization
by: Chen, Evan, et al.
Published: (2025)