Saved in:
| Main Authors: | Rajgopal, Ajay Navilarekal, Solmsdorf, Nikolai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.07726 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)
by: Golden, Alicia, et al.
Published: (2025)
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
by: Duan, Jiangfei, et al.
Published: (2024)
by: Duan, Jiangfei, et al.
Published: (2024)
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
by: Xu, Minxian, et al.
Published: (2026)
by: Xu, Minxian, et al.
Published: (2026)
An Engineering Journey Training Large Language Models at Scale on Alps: The Apertus Experience
by: Coles, Jonathan, et al.
Published: (2026)
by: Coles, Jonathan, et al.
Published: (2026)
HexiScale: Facilitating Large Language Model Training over Heterogeneous Hardware
by: Yan, Ran, et al.
Published: (2024)
by: Yan, Ran, et al.
Published: (2024)
Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
by: Li, Shengwei, et al.
Published: (2023)
by: Li, Shengwei, et al.
Published: (2023)
ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
by: Yang, Yi, et al.
Published: (2025)
by: Yang, Yi, et al.
Published: (2025)
RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks
by: Niam, Arefin, et al.
Published: (2025)
by: Niam, Arefin, et al.
Published: (2025)
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
by: Lian, Xinyu, et al.
Published: (2025)
by: Lian, Xinyu, et al.
Published: (2025)
Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
by: Guo, Yongjian, et al.
Published: (2026)
by: Guo, Yongjian, et al.
Published: (2026)
H2:Towards Efficient Large-Scale LLM Training on Hyper-Heterogeneous Cluster over 1,000 Chips
by: Tang, Ding, et al.
Published: (2025)
by: Tang, Ding, et al.
Published: (2025)
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
by: Hagemann, Johannes, et al.
Published: (2023)
by: Hagemann, Johannes, et al.
Published: (2023)
DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models
by: Zhang, Zili, et al.
Published: (2024)
by: Zhang, Zili, et al.
Published: (2024)
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
by: Chen, Qiaoling, et al.
Published: (2024)
by: Chen, Qiaoling, et al.
Published: (2024)
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
by: Sun, Ao, et al.
Published: (2024)
by: Sun, Ao, et al.
Published: (2024)
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
by: Ma, Qianli, et al.
Published: (2025)
by: Ma, Qianli, et al.
Published: (2025)
λScale: Enabling Fast Scaling for Serverless Large Language Model Inference
by: Yu, Minchen, et al.
Published: (2025)
by: Yu, Minchen, et al.
Published: (2025)
DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
by: Wang, Zhixin, et al.
Published: (2025)
by: Wang, Zhixin, et al.
Published: (2025)
A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine
by: Li, Anran, et al.
Published: (2026)
by: Li, Anran, et al.
Published: (2026)
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
by: Fang, Jiahao, et al.
Published: (2024)
by: Fang, Jiahao, et al.
Published: (2024)
DeepServe: Serverless Large Language Model Serving at Scale
by: Hu, Junhao, et al.
Published: (2025)
by: Hu, Junhao, et al.
Published: (2025)
BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models
by: Wang, Zhengyang, et al.
Published: (2025)
by: Wang, Zhengyang, et al.
Published: (2025)
CFP: Efficient Optimization of Intra-Operator Parallelism Plans for Large Model Training
by: Hu, Weifang, et al.
Published: (2025)
by: Hu, Weifang, et al.
Published: (2025)
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
by: Xu, Lang, et al.
Published: (2025)
by: Xu, Lang, et al.
Published: (2025)
Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine
by: Zhang, Zuoning, et al.
Published: (2024)
by: Zhang, Zuoning, et al.
Published: (2024)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models
by: Hakim, Sheikh Azizul, et al.
Published: (2025)
by: Hakim, Sheikh Azizul, et al.
Published: (2025)
Cascadia: An Efficient Cascade Serving System for Large Language Models
by: Jiang, Youhe, et al.
Published: (2025)
by: Jiang, Youhe, et al.
Published: (2025)
Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2024)
by: Wijeratne, Sasindu, et al.
Published: (2024)
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
by: Jiang, Ziheng, et al.
Published: (2024)
by: Jiang, Ziheng, et al.
Published: (2024)
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
by: Jin, Chao, et al.
Published: (2025)
by: Jin, Chao, et al.
Published: (2025)
SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods
by: Yang, Shaofeng, et al.
Published: (2026)
by: Yang, Shaofeng, et al.
Published: (2026)
Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP
by: Zhao, Yilong, et al.
Published: (2026)
by: Zhao, Yilong, et al.
Published: (2026)
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
by: Sakip, Akhmed, et al.
Published: (2026)
by: Sakip, Akhmed, et al.
Published: (2026)
FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State Management
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
Pier: Efficient Large Language Model pretraining with Relaxed Global Communication
by: Fan, Shuyuan, et al.
Published: (2025)
by: Fan, Shuyuan, et al.
Published: (2025)
LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models
by: Sun, Minqiu, et al.
Published: (2026)
by: Sun, Minqiu, et al.
Published: (2026)
Similar Items
-
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025) -
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025) -
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
by: Duan, Jiangfei, et al.
Published: (2024) -
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025) -
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
by: Xu, Minxian, et al.
Published: (2026)