:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Yicheng, Chen, Yuetao, Chen, Kaiwen, Li, Jingzong, Wu, Tianyuan, Cheng, Peng, Wu, Chuan, Wang, Wei, Ho, Tsung-Yi, Xu, Hong
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2412.12487
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024)

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training
by: Tan, Xin, et al.
Published: (2025)

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026)

FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
by: Wu, Tianyuan, et al.
Published: (2024)

ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
by: Yang, Yi, et al.
Published: (2025)

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
by: Zhao, Juntao, et al.
Published: (2024)

MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training
by: Zhao, Juntao, et al.
Published: (2025)

BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens
by: Sun, Ao, et al.
Published: (2025)

AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
by: Yan, Ran, et al.
Published: (2025)

HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
by: Liang, Yan, et al.
Published: (2026)

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)

On-the-fly Communication-and-Computing to Enable Representation Learning for Distributed Point Clouds
by: Chen, Xu, et al.
Published: (2024)

Adaptra: Straggler-Resilient Hybrid-Parallel Training with Pipeline Adaptation
by: Wu, Tianyuan, et al.
Published: (2025)

MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)

DeepServe: Serverless Large Language Model Serving at Scale
by: Hu, Junhao, et al.
Published: (2025)

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training
by: Wu, Tianyuan, et al.
Published: (2025)

Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing
by: Liu, Mengfan, et al.
Published: (2025)

SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)

SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)

DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
by: Tian, Ye, et al.
Published: (2024)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing
by: Wang, Yanbo, et al.
Published: (2026)

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)

DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
by: Zhang, Jinxiao, et al.
Published: (2026)

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)

Optimizing Distributed Training Approaches for Scaling Neural Networks
by: Baligodugula, Vishnu Vardhan, et al.
Published: (2025)

TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation
by: Chai, Huichao, et al.
Published: (2026)

Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training
by: Lu, Kuan-Wei, et al.
Published: (2025)

MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)

ModTrans: Translating Real-world Models for Distributed Training Simulator
by: Lyu, Yi
Published: (2026)

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)

Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training
by: Xu, Guanbin, et al.
Published: (2026)

Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)

A Study on the Performance of Distributed Training of Data-driven CFD Simulations
by: Iserte, Sergio, et al.
Published: (2026)

AMDP: Asynchronous Multi-Directional Pipeline Parallelism for Large-Scale Models Training
by: Chen, Ling, et al.
Published: (2026)

Spatiotemporal Traffic Prediction in Distributed Backend Systems via Graph Neural Networks
by: Qiu, Zhimin, et al.
Published: (2025)

Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
by: Shen, Tao, et al.
Published: (2025)

Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)