Saved in:
| Main Authors: | Zhao, Bohan, Wang, Yuanhong, Liu, Chenglin, Pan, Jiagi, Yang, Guang, Liu, Ruitao, Zhang, Tingrui, Luo, Kai, Xu, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.03644 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability
by: Liu, Ruitao, et al.
Published: (2026)
by: Liu, Ruitao, et al.
Published: (2026)
Varuna: Enabling Failure-Type Aware RDMA Failover
by: Wang, Xiaoyang, et al.
Published: (2026)
by: Wang, Xiaoyang, et al.
Published: (2026)
Uber's Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure
by: Bansal, Mayank, et al.
Published: (2026)
by: Bansal, Mayank, et al.
Published: (2026)
On the Resilience of Fast Failover Routing Against Dynamic Link Failures
by: Dai, Wenkai, et al.
Published: (2024)
by: Dai, Wenkai, et al.
Published: (2024)
FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training
by: Qi, Shuyao, et al.
Published: (2026)
by: Qi, Shuyao, et al.
Published: (2026)
CXL Shared Memory Programming: Barely Distributed and Almost Persistent
by: Xu, Yi, et al.
Published: (2024)
by: Xu, Yi, et al.
Published: (2024)
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
by: Sun, Ao, et al.
Published: (2024)
by: Sun, Ao, et al.
Published: (2024)
SIMPLE: Disaggregating Sampling from GPU Inference into a Decision Plane for Faster Distributed LLM Serving
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
by: He, Yongchao, et al.
Published: (2025)
by: He, Yongchao, et al.
Published: (2025)
Fast Distributed Inference Serving for Large Language Models
by: Wu, Bingyang, et al.
Published: (2023)
by: Wu, Bingyang, et al.
Published: (2023)
Broadcast in Almost Mixing Time
by: Paramonov, Anton, et al.
Published: (2025)
by: Paramonov, Anton, et al.
Published: (2025)
Mell: Memory-Efficient Large Language Model Serving via Multi-GPU KV Cache Management
by: Qianli, Liu, et al.
Published: (2025)
by: Qianli, Liu, et al.
Published: (2025)
Fast Iterative Graph Computing with Updated Neighbor States
by: Zhou, Yijie, et al.
Published: (2024)
by: Zhou, Yijie, et al.
Published: (2024)
An Almost Tight Lower Bound for Plurality Consensus with Undecided State Dynamics in the Population Protocol Model
by: El-Hayek, Antoine, et al.
Published: (2025)
by: El-Hayek, Antoine, et al.
Published: (2025)
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
by: Sakip, Akhmed, et al.
Published: (2026)
by: Sakip, Akhmed, et al.
Published: (2026)
Enhancing Memory Efficiency in Large Language Model Training Through Chronos-aware Pipeline Parallelism
by: Lin, Xinyuan, et al.
Published: (2025)
by: Lin, Xinyuan, et al.
Published: (2025)
StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
by: Liu, Ziming, et al.
Published: (2024)
by: Liu, Ziming, et al.
Published: (2024)
Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
by: Li, Shengwei, et al.
Published: (2023)
by: Li, Shengwei, et al.
Published: (2023)
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
by: Han, Shujie, et al.
Published: (2026)
by: Han, Shujie, et al.
Published: (2026)
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
λScale: Enabling Fast Scaling for Serverless Large Language Model Inference
by: Yu, Minchen, et al.
Published: (2025)
by: Yu, Minchen, et al.
Published: (2025)
W4A16 Mixed-Precision Matrix Multiplication on Decoupled Architecture: Kernel Design and Memory Bottleneck Analysis for Ascend NPUs
by: He, Yuanhong, et al.
Published: (2026)
by: He, Yuanhong, et al.
Published: (2026)
DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models
by: Zhang, Zili, et al.
Published: (2024)
by: Zhang, Zili, et al.
Published: (2024)
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)
by: Zhong, Yuchen, et al.
Published: (2023)
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
Sparse Checkpointing for Fast and Reliable MoE Training
by: Gandhi, Swapnil, et al.
Published: (2024)
by: Gandhi, Swapnil, et al.
Published: (2024)
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
by: Duan, Jiangfei, et al.
Published: (2024)
by: Duan, Jiangfei, et al.
Published: (2024)
FLeeC: a Fast Lock-Free Application Cache
by: Costa, André J., et al.
Published: (2024)
by: Costa, André J., et al.
Published: (2024)
CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing
by: Yuan, Yitao, et al.
Published: (2025)
by: Yuan, Yitao, et al.
Published: (2025)
Fast State Restoration in LLM Serving with HCache
by: Gao, Shiwei, et al.
Published: (2024)
by: Gao, Shiwei, et al.
Published: (2024)
Mangrove: Fast and Parallelizable State Replication for Blockchains
by: Paramonov, Anton, et al.
Published: (2025)
by: Paramonov, Anton, et al.
Published: (2025)
SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference
by: Zhao, Yihao, et al.
Published: (2025)
by: Zhao, Yihao, et al.
Published: (2025)
HexiScale: Facilitating Large Language Model Training over Heterogeneous Hardware
by: Yan, Ran, et al.
Published: (2024)
by: Yan, Ran, et al.
Published: (2024)
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience
by: Coles, Jonathan, et al.
Published: (2026)
by: Coles, Jonathan, et al.
Published: (2026)
An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models
by: Hakim, Sheikh Azizul, et al.
Published: (2025)
by: Hakim, Sheikh Azizul, et al.
Published: (2025)
HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving
by: Dong, Xianzhe, et al.
Published: (2025)
by: Dong, Xianzhe, et al.
Published: (2025)
FastMPS: Revisit Data Parallel in Large-scale Matrix Product State Sampling
by: Chen, Yaojian, et al.
Published: (2025)
by: Chen, Yaojian, et al.
Published: (2025)
PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling
by: Liu, Chongpeng, et al.
Published: (2025)
by: Liu, Chongpeng, et al.
Published: (2025)
Building State Machine Replication Using Practical Network Synchrony
by: Wan, Yiliang, et al.
Published: (2025)
by: Wan, Yiliang, et al.
Published: (2025)
Similar Items
-
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025) -
A Readiness-Driven Runtime for Pipeline-Parallel Training under Runtime Variability
by: Liu, Ruitao, et al.
Published: (2026) -
Varuna: Enabling Failure-Type Aware RDMA Failover
by: Wang, Xiaoyang, et al.
Published: (2026) -
Uber's Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure
by: Bansal, Mayank, et al.
Published: (2026) -
On the Resilience of Fast Failover Routing Against Dynamic Link Failures
by: Dai, Wenkai, et al.
Published: (2024)