Saved in:
| Main Authors: | Feng, Yicheng, Chen, Yuetao, Chen, Kaiwen, Li, Jingzong, Wu, Tianyuan, Cheng, Peng, Wu, Chuan, Wang, Wei, Ho, Tsung-Yi, Xu, Hong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.12487 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024)
by: Zhong, Yuchen, et al.
Published: (2024)
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training
by: Tan, Xin, et al.
Published: (2025)
by: Tan, Xin, et al.
Published: (2025)
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026)
by: Xue, Chunyu, et al.
Published: (2026)
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
by: Wu, Tianyuan, et al.
Published: (2024)
by: Wu, Tianyuan, et al.
Published: (2024)
ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)
by: Golden, Alicia, et al.
Published: (2025)
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
by: Zhao, Juntao, et al.
Published: (2024)
by: Zhao, Juntao, et al.
Published: (2024)
MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training
by: Zhao, Juntao, et al.
Published: (2025)
by: Zhao, Juntao, et al.
Published: (2025)
BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens
by: Sun, Ao, et al.
Published: (2025)
by: Sun, Ao, et al.
Published: (2025)
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
by: Yan, Ran, et al.
Published: (2025)
by: Yan, Ran, et al.
Published: (2025)
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
by: Liang, Yan, et al.
Published: (2026)
by: Liang, Yan, et al.
Published: (2026)
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)
by: Zhang, Shiwei, et al.
Published: (2024)
On-the-fly Communication-and-Computing to Enable Representation Learning for Distributed Point Clouds
by: Chen, Xu, et al.
Published: (2024)
by: Chen, Xu, et al.
Published: (2024)
Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
by: Wu, Tianyuan, et al.
Published: (2025)
by: Wu, Tianyuan, et al.
Published: (2025)
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
DeepServe: Serverless Large Language Model Serving at Scale
by: Hu, Junhao, et al.
Published: (2025)
by: Hu, Junhao, et al.
Published: (2025)
RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training
by: Wu, Tianyuan, et al.
Published: (2025)
by: Wu, Tianyuan, et al.
Published: (2025)
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
by: Liu, Mengfan, et al.
Published: (2025)
by: Liu, Mengfan, et al.
Published: (2025)
SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)
by: Chen, Liangkun, et al.
Published: (2025)
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)
by: Zhong, Yuchen, et al.
Published: (2023)
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)
by: Li, Jiamin, et al.
Published: (2022)
Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing
by: Wang, Yanbo, et al.
Published: (2026)
by: Wang, Yanbo, et al.
Published: (2026)
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)
by: Li, Suyi, et al.
Published: (2024)
DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
by: Zhang, Jinxiao, et al.
Published: (2026)
by: Zhang, Jinxiao, et al.
Published: (2026)
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)
by: Zhuang, Chen, et al.
Published: (2024)
Optimizing Distributed Training Approaches for Scaling Neural Networks
by: Baligodugula, Vishnu Vardhan, et al.
Published: (2025)
by: Baligodugula, Vishnu Vardhan, et al.
Published: (2025)
TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation
by: Chai, Huichao, et al.
Published: (2026)
by: Chai, Huichao, et al.
Published: (2026)
Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training
by: Lu, Kuan-Wei, et al.
Published: (2025)
by: Lu, Kuan-Wei, et al.
Published: (2025)
MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)
by: Wang, Yuxiang, et al.
Published: (2025)
ModTrans: Translating Real-world Models for Distributed Training Simulator
by: Lyu, Yi
Published: (2026)
by: Lyu, Yi
Published: (2026)
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)
by: Chen, Fahao, et al.
Published: (2024)
Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training
by: Xu, Guanbin, et al.
Published: (2026)
by: Xu, Guanbin, et al.
Published: (2026)
Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)
by: Jang, Insu, et al.
Published: (2025)
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)
by: Sun, Mo, et al.
Published: (2024)
A Study on the Performance of Distributed Training of Data-driven CFD Simulations
by: Iserte, Sergio, et al.
Published: (2026)
by: Iserte, Sergio, et al.
Published: (2026)
AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training
by: Chen, Ling, et al.
Published: (2026)
by: Chen, Ling, et al.
Published: (2026)
Spatiotemporal Traffic Prediction in Distributed Backend Systems via Graph Neural Networks
by: Qiu, Zhimin, et al.
Published: (2025)
by: Qiu, Zhimin, et al.
Published: (2025)
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
by: Shen, Tao, et al.
Published: (2025)
by: Shen, Tao, et al.
Published: (2025)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
Similar Items
-
Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024) -
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training
by: Tan, Xin, et al.
Published: (2025) -
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026) -
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
by: Wu, Tianyuan, et al.
Published: (2024) -
ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
by: Yang, Yi, et al.
Published: (2025)