Saved in:
| Main Authors: | Lin, Jun-Liang, Madduri, Kamesh, Kandemir, Mahmut Taylan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.16715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
by: Jeon, Byungsoo, et al.
Published: (2024)
by: Jeon, Byungsoo, et al.
Published: (2024)
TorchGT: A Holistic System for Large-scale Graph Transformer Training
by: Zhang, Meng, et al.
Published: (2024)
by: Zhang, Meng, et al.
Published: (2024)
Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training
by: Ranjan, Aditya K., et al.
Published: (2025)
by: Ranjan, Aditya K., et al.
Published: (2025)
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
by: Wan, Xinyi, et al.
Published: (2025)
by: Wan, Xinyi, et al.
Published: (2025)
Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
by: Wei, Cunyang, et al.
Published: (2026)
by: Wei, Cunyang, et al.
Published: (2026)
Context Parallelism for Scalable Million-Token Inference
by: Yang, Amy, et al.
Published: (2024)
by: Yang, Amy, et al.
Published: (2024)
Parallel-friendly Spatio-Temporal Graph Learning for Photovoltaic Degradation Analysis at Scale
by: Fan, Yangxin, et al.
Published: (2024)
by: Fan, Yangxin, et al.
Published: (2024)
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training
by: Liu, Guanliang, et al.
Published: (2026)
by: Liu, Guanliang, et al.
Published: (2026)
BitPipe: Bidirectional Interleaved Pipeline Parallelism for Accelerating Large Models Training
by: Wu, Houming, et al.
Published: (2024)
by: Wu, Houming, et al.
Published: (2024)
AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
by: Guo, Yucheng, et al.
Published: (2026)
by: Guo, Yucheng, et al.
Published: (2026)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
by: Chen, Yanxi, et al.
Published: (2023)
by: Chen, Yanxi, et al.
Published: (2023)
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
by: Dash, Sajal, et al.
Published: (2026)
by: Dash, Sajal, et al.
Published: (2026)
TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training
by: Wu, Houming, et al.
Published: (2025)
by: Wu, Houming, et al.
Published: (2025)
Laminar: A Scalable Asynchronous RL Post-Training Framework
by: Sheng, Guangming, et al.
Published: (2025)
by: Sheng, Guangming, et al.
Published: (2025)
Semi-decentralized Training of Spatio-Temporal Graph Neural Networks for Traffic Prediction
by: Kralj, Ivan, et al.
Published: (2024)
by: Kralj, Ivan, et al.
Published: (2024)
GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs
by: Jin, Yue, et al.
Published: (2025)
by: Jin, Yue, et al.
Published: (2025)
GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism
by: Polisetty, Sandeep, et al.
Published: (2023)
by: Polisetty, Sandeep, et al.
Published: (2023)
TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks
by: Shi, Ziji, et al.
Published: (2023)
by: Shi, Ziji, et al.
Published: (2023)
A Parallel Alternative for Energy-Efficient Neural Network Training and Inferencing
by: Seal, Sudip K., et al.
Published: (2025)
by: Seal, Sudip K., et al.
Published: (2025)
Zero Bubble Pipeline Parallelism
by: Qi, Penghui, et al.
Published: (2023)
by: Qi, Penghui, et al.
Published: (2023)
DisagMoE: Computation-Communication overlapped MoE Training via Disaggregated AF-Pipe Parallelism
by: Zeng, Zhichen, et al.
Published: (2026)
by: Zeng, Zhichen, et al.
Published: (2026)
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
by: Hu, Qinghao, et al.
Published: (2025)
by: Hu, Qinghao, et al.
Published: (2025)
Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNs
by: Kralj, Ivan, et al.
Published: (2025)
by: Kralj, Ivan, et al.
Published: (2025)
Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs
by: Wu, Xiabao, et al.
Published: (2025)
by: Wu, Xiabao, et al.
Published: (2025)
GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable
by: Wangni, Jianqiao
Published: (2025)
by: Wangni, Jianqiao
Published: (2025)
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
by: Singh, Siddharth, et al.
Published: (2025)
by: Singh, Siddharth, et al.
Published: (2025)
Jet: Multilevel Graph Partitioning on Graphics Processing Units
by: Gilbert, Michael S., et al.
Published: (2023)
by: Gilbert, Michael S., et al.
Published: (2023)
SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025)
by: Kim, Han-Byul, et al.
Published: (2025)
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
by: Choukroun, Yoni, et al.
Published: (2024)
by: Choukroun, Yoni, et al.
Published: (2024)
Federated Graph Learning with Structure Proxy Alignment
by: Fu, Xingbo, et al.
Published: (2024)
by: Fu, Xingbo, et al.
Published: (2024)
Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer
by: Vooturi, Dharma Teja, et al.
Published: (2026)
by: Vooturi, Dharma Teja, et al.
Published: (2026)
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
by: Tomczak, Nathaniel, et al.
Published: (2025)
by: Tomczak, Nathaniel, et al.
Published: (2025)
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
by: Yao, Jinghan, et al.
Published: (2024)
by: Yao, Jinghan, et al.
Published: (2024)
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
by: Zuo, Jingwei, et al.
Published: (2026)
by: Zuo, Jingwei, et al.
Published: (2026)
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
by: Singh, Siddharth, et al.
Published: (2023)
by: Singh, Siddharth, et al.
Published: (2023)
Parallel Split Learning with Global Sampling
by: Kohankhaki, Mohammad, et al.
Published: (2024)
by: Kohankhaki, Mohammad, et al.
Published: (2024)
SEAFL: Enhancing Efficiency in Semi-Asynchronous Federated Learning through Adaptive Aggregation and Selective Training
by: Islam, Md Sirajul, et al.
Published: (2025)
by: Islam, Md Sirajul, et al.
Published: (2025)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)
by: Tang, Zhenheng, et al.
Published: (2024)
Sampling Parallelism for Fast and Efficient Bayesian Learning
by: Özdemir, Asena Karolin, et al.
Published: (2026)
by: Özdemir, Asena Karolin, et al.
Published: (2026)
Gradient Correction in Federated Learning with Adaptive Optimization
by: Chen, Evan, et al.
Published: (2025)
by: Chen, Evan, et al.
Published: (2025)
Similar Items
-
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
by: Jeon, Byungsoo, et al.
Published: (2024) -
TorchGT: A Holistic System for Large-scale Graph Transformer Training
by: Zhang, Meng, et al.
Published: (2024) -
Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training
by: Ranjan, Aditya K., et al.
Published: (2025) -
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
by: Wan, Xinyi, et al.
Published: (2025) -
Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training
by: Wei, Cunyang, et al.
Published: (2026)