Saved in:
| Main Authors: | Wang, Zhuang, Xu, Zhaozhuo, Xi, Jingyi, Wang, Yuke, Shrivastava, Anshumali, Ng, T. S. Eugene |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.13254 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
by: Zhang, Tianyi, et al.
Published: (2025)
by: Zhang, Tianyi, et al.
Published: (2025)
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
by: Zhao, Juntao, et al.
Published: (2024)
by: Zhao, Juntao, et al.
Published: (2024)
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
by: Zuo, Jingwei, et al.
Published: (2026)
by: Zuo, Jingwei, et al.
Published: (2026)
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025)
by: Gao, Wei, et al.
Published: (2025)
BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training
by: Sun, Ting, et al.
Published: (2026)
by: Sun, Ting, et al.
Published: (2026)
Echo: Simulating Distributed Training At Scale
by: Feng, Yicheng, et al.
Published: (2024)
by: Feng, Yicheng, et al.
Published: (2024)
Agglomerative Federated Learning: Empowering Larger Model Training via End-Edge-Cloud Collaboration
by: Wu, Zhiyuan, et al.
Published: (2023)
by: Wu, Zhiyuan, et al.
Published: (2023)
Empowering Data Mesh with Federated Learning
by: Li, Haoyuan, et al.
Published: (2024)
by: Li, Haoyuan, et al.
Published: (2024)
CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers
by: Bian, Jieming, et al.
Published: (2023)
by: Bian, Jieming, et al.
Published: (2023)
Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous Collectives
by: Dichev, Kiril, et al.
Published: (2025)
by: Dichev, Kiril, et al.
Published: (2025)
Efficient Data Distribution Estimation for Accelerated Federated Learning
by: Wang, Yuanli, et al.
Published: (2024)
by: Wang, Yuanli, et al.
Published: (2024)
Incentivizing Permissionless Distributed Learning of LLMs
by: Lidin, Joel, et al.
Published: (2025)
by: Lidin, Joel, et al.
Published: (2025)
Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization
by: Wang, Chong, et al.
Published: (2026)
by: Wang, Chong, et al.
Published: (2026)
Robust Fully-Asynchronous Methods for Distributed Training over General Architecture
by: Zhu, Zehan, et al.
Published: (2023)
by: Zhu, Zehan, et al.
Published: (2023)
ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency
by: Yao, Yuhang, et al.
Published: (2024)
by: Yao, Yuhang, et al.
Published: (2024)
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
by: Go, Seokjin, et al.
Published: (2025)
by: Go, Seokjin, et al.
Published: (2025)
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
by: Du, Zhixu, et al.
Published: (2023)
by: Du, Zhixu, et al.
Published: (2023)
Accelerating Distributed ML Training via Selective Synchronization
by: Tyagi, Sahil, et al.
Published: (2023)
by: Tyagi, Sahil, et al.
Published: (2023)
Distributed Training under Packet Loss
by: Weintraub, Erez, et al.
Published: (2025)
by: Weintraub, Erez, et al.
Published: (2025)
Empowering Federated Learning for Massive Models with NVIDIA FLARE
by: Roth, Holger R., et al.
Published: (2024)
by: Roth, Holger R., et al.
Published: (2024)
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
by: Deng, Yangtao, et al.
Published: (2024)
by: Deng, Yangtao, et al.
Published: (2024)
Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront Scheduling
by: Wang, Yujie, et al.
Published: (2024)
by: Wang, Yujie, et al.
Published: (2024)
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
by: Xiao, Youshao, et al.
Published: (2024)
by: Xiao, Youshao, et al.
Published: (2024)
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
by: Qin, Ruoyu, et al.
Published: (2025)
by: Qin, Ruoyu, et al.
Published: (2025)
Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective
by: Yuan, Hao, et al.
Published: (2023)
by: Yuan, Hao, et al.
Published: (2023)
Resource Efficient Asynchronous Federated Learning for Digital Twin Empowered IoT Network
by: Chu, Shunfeng, et al.
Published: (2024)
by: Chu, Shunfeng, et al.
Published: (2024)
Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities
by: Wei, Yunze, et al.
Published: (2024)
by: Wei, Yunze, et al.
Published: (2024)
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models
by: Wu, Yongji, et al.
Published: (2024)
by: Wu, Yongji, et al.
Published: (2024)
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse
by: Ding, Yahao, et al.
Published: (2024)
by: Ding, Yahao, et al.
Published: (2024)
Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
by: Hagemann, Johannes, et al.
Published: (2023)
by: Hagemann, Johannes, et al.
Published: (2023)
Distributed Convolutional Neural Network Training on Mobile and Edge Clusters
by: Rama, Pranav, et al.
Published: (2024)
by: Rama, Pranav, et al.
Published: (2024)
Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
by: Deng, Yangtao, et al.
Published: (2025)
by: Deng, Yangtao, et al.
Published: (2025)
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
by: Cai, Weilin, et al.
Published: (2025)
by: Cai, Weilin, et al.
Published: (2025)
Unicron: Economizing Self-Healing LLM Training at Scale
by: He, Tao, et al.
Published: (2023)
by: He, Tao, et al.
Published: (2023)
Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet
by: Lidin, Joel, et al.
Published: (2026)
by: Lidin, Joel, et al.
Published: (2026)
An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training
by: Merkel, Nikolai, et al.
Published: (2023)
by: Merkel, Nikolai, et al.
Published: (2023)
SparDL: Distributed Deep Learning Training with Efficient Sparse Communication
by: Zhao, Minjun, et al.
Published: (2023)
by: Zhao, Minjun, et al.
Published: (2023)
Understanding Silent Data Corruption in LLM Training
by: Ma, Jeffrey, et al.
Published: (2025)
by: Ma, Jeffrey, et al.
Published: (2025)
Fully Distributed Online Training of Graph Neural Networks in Networked Systems
by: Olshevskyi, Rostyslav, et al.
Published: (2024)
by: Olshevskyi, Rostyslav, et al.
Published: (2024)
Similar Items
-
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
by: Zhang, Tianyi, et al.
Published: (2025) -
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
by: Zhao, Juntao, et al.
Published: (2024) -
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
by: Zuo, Jingwei, et al.
Published: (2026) -
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025) -
BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training
by: Sun, Ting, et al.
Published: (2026)