:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Meng, Lin, Sun, Yuzhong
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2503.16815
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)

BlockRaFT: A Distributed Framework for Fault-Tolerant and Scalable Blockchain Nodes
by: Piduguralla, Manaswini, et al.
Published: (2026)

Bandwidth-Aware and Cost-Efficient Pipeline Parallel Scheduling in Geo-Distributed LLM Training
by: Zhang, Han, et al.
Published: (2026)

Task Scheduling in Geo-Distributed Computing: A Survey
by: Wu, Yujian, et al.
Published: (2025)

Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions
by: Zhao, Hailiang, et al.
Published: (2024)

QoE-oriented Dependent Task Scheduling under Multi-dimensional QoS Constraints over Distributed Networks
by: Fan, Xuwei, et al.
Published: (2023)

DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling
by: Pan, Yi, et al.
Published: (2026)

FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed Mixture-of-Experts Training
by: Gao, Yunqi, et al.
Published: (2025)

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges
by: Stavrinides, Georgios L., et al.
Published: (2025)

ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
by: Yang, Yi, et al.
Published: (2025)

Raptor: Distributed Scheduling for Serverless Functions
by: Exton, Kevin, et al.
Published: (2024)

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training
by: Zhao, Lu, et al.
Published: (2025)

iDDS: Intelligent Distributed Dispatch and Scheduling for Workflow Orchestration
by: Guan, Wen, et al.
Published: (2025)

Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training
by: Xu, Guanbin, et al.
Published: (2026)

CO2: Efficient Distributed Training with Full Communication-Computation Overlap
by: Sun, Weigao, et al.
Published: (2024)

FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion
by: Zhu, Zhuoran, et al.
Published: (2025)

Scheduling of Distributed Applications on the Computing Continuum: A Survey
by: Mehran, Narges, et al.
Published: (2024)

Trustworthy Scheduling for Big Data Applications
by: Tomaras, Dimitrios, et al.
Published: (2026)

Schedule-Level Shared-Prefix Reuse for LLM RL Training
by: Li, Pengbo, et al.
Published: (2026)

Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
by: Pan, Xinglin, et al.
Published: (2024)

RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks
by: Niam, Arefin, et al.
Published: (2025)

Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
by: Wang, Haiquan, et al.
Published: (2024)

CondenseGraph: Communication-Efficient Distributed GNN Training via On-the-Fly Graph Condensation
by: Zhang, Zizhao, et al.
Published: (2026)

GreenDyGNN: Runtime-Adaptive Energy-Efficient Communication for Distributed GNN Training
by: Niam, Arefin, et al.
Published: (2026)

CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
by: Chen, Tiancheng, et al.
Published: (2025)

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads
by: Timilsina, Sankalpa, et al.
Published: (2025)

A Study on the Performance of Distributed Training of Data-driven CFD Simulations
by: Iserte, Sergio, et al.
Published: (2026)

PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning
by: Olama, Alireza, et al.
Published: (2025)

Eventually-Consistent Federated Scheduling for Data Center Workloads
by: Thiyyakat, Meghana, et al.
Published: (2023)

Retrofitting Service Dependency Discovery in Distributed Systems
by: Landau, Diogo, et al.
Published: (2025)

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
by: Duan, Jiangfei, et al.
Published: (2024)

FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation
by: Palihawadana, Chamath, et al.
Published: (2024)

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)

DaggerFFT: A Distributed FFT Framework Using Task Scheduling in Julia
by: Anvari, Sana Taghipour, et al.
Published: (2026)

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)

Metronome: Efficient Scheduling for Periodic Traffic Jobs with Network and Priority Awareness
by: Jiang, Hao, et al.
Published: (2025)

A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training
by: Jiang, Lijuan, et al.
Published: (2025)

Distributed Load Balancing with Workload-Dependent Service Rates
by: Zhang, Wenxin, et al.
Published: (2024)

A Reinforcement Learning-Driven Task Scheduling Algorithm for Multi-Tenant Distributed Systems
by: Zhang, Xiaopei, et al.
Published: (2025)

Optimizing Frequent Checkpointing via Low-Cost Differential for Distributed Training Systems
by: Yao, Chenxuan, et al.
Published: (2025)