Saved in:
| Main Authors: | Wang, Zezhou, Li, Youjie, Lin, Zhiqi, Yang, Jiacheng, Xie, Cong, Feng, Guanyu, Zhong, Zheng, Huang, Ziyue, Zhu, Hongyu, Zhang, Zhi, Peng, Yanghua, Liu, Xin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22437 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD
by: Li, Youjie, et al.
Published: (2025)
by: Li, Youjie, et al.
Published: (2025)
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
by: Zhang, Ruisi, et al.
Published: (2024)
by: Zhang, Ruisi, et al.
Published: (2024)
Performance Characterization of Distributed Deep Learning Strategies: A Quantitative Evaluation of DDP, FSDP, and Parameter Server Architectures on GPU Clusters
by: Ovi, Md Sultanul Islam
Published: (2025)
by: Ovi, Md Sultanul Islam
Published: (2025)
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
by: Ma, Qianli, et al.
Published: (2025)
by: Ma, Qianli, et al.
Published: (2025)
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
by: Jin, Chao, et al.
Published: (2025)
by: Jin, Chao, et al.
Published: (2025)
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026)
by: Xue, Chunyu, et al.
Published: (2026)
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
by: Jiang, Ziheng, et al.
Published: (2024)
by: Jiang, Ziheng, et al.
Published: (2024)
Exploring Uncore Frequency Scaling for Heterogeneous Computing
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
by: Feng, Weiqi, et al.
Published: (2024)
by: Feng, Weiqi, et al.
Published: (2024)
MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training
by: Zhao, Juntao, et al.
Published: (2025)
by: Zhao, Juntao, et al.
Published: (2025)
FlexKV: Flexible Index Offloading for Memory-Disaggregated Key-Value Store
by: Hu, Zhisheng, et al.
Published: (2025)
by: Hu, Zhisheng, et al.
Published: (2025)
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine
by: Zhang, Zuoning, et al.
Published: (2024)
by: Zhang, Zuoning, et al.
Published: (2024)
Comparing Cross-Platform Performance via Node-to-Node Scaling Studies
by: Weiss, Kenneth, et al.
Published: (2025)
by: Weiss, Kenneth, et al.
Published: (2025)
Data Caching for Enterprise-Grade Petabyte-Scale OLAP
by: Tang, Chunxu, et al.
Published: (2024)
by: Tang, Chunxu, et al.
Published: (2024)
EdgeVision: Towards Collaborative Video Analytics on Distributed Edges for Performance Maximization
by: Gao, Guanyu, et al.
Published: (2022)
by: Gao, Guanyu, et al.
Published: (2022)
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
madupite: A High-Performance Distributed Solver for Large-Scale Markov Decision Processes
by: Gargiani, Matilde, et al.
Published: (2025)
by: Gargiani, Matilde, et al.
Published: (2025)
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)
by: Golden, Alicia, et al.
Published: (2025)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications
by: Wen, Linfeng, et al.
Published: (2024)
by: Wen, Linfeng, et al.
Published: (2024)
DeepServe: Serverless Large Language Model Serving at Scale
by: Hu, Junhao, et al.
Published: (2025)
by: Hu, Junhao, et al.
Published: (2025)
PolarStore: High-Performance Data Compression for Large-Scale Cloud-Native Databases
by: Hu, Qingda, et al.
Published: (2025)
by: Hu, Qingda, et al.
Published: (2025)
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026)
by: Li, Haley, et al.
Published: (2026)
λScale: Enabling Fast Scaling for Serverless Large Language Model Inference
by: Yu, Minchen, et al.
Published: (2025)
by: Yu, Minchen, et al.
Published: (2025)
M$^2$-MFP: A Multi-Scale and Multi-Level Memory Failure Prediction Framework for Reliable Cloud Infrastructure
by: Xie, Hongyi, et al.
Published: (2025)
by: Xie, Hongyi, et al.
Published: (2025)
Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale
by: Mittal, Kashish, et al.
Published: (2026)
by: Mittal, Kashish, et al.
Published: (2026)
A Tale of Two Scales: Reconciling Horizontal and Vertical Scaling for Inference Serving Systems
by: Razavi, Kamran, et al.
Published: (2024)
by: Razavi, Kamran, et al.
Published: (2024)
Case Study: Performance Analysis of a Virtualized XRootD Frontend in Large-Scale WAN Transfers
by: da Silva, J M, et al.
Published: (2026)
by: da Silva, J M, et al.
Published: (2026)
Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale
by: Guo, Zhuoqiang, et al.
Published: (2025)
by: Guo, Zhuoqiang, et al.
Published: (2025)
FAIR Ecosystems for Science at Scale
by: Wilkinson, Sean R., et al.
Published: (2025)
by: Wilkinson, Sean R., et al.
Published: (2025)
Scaling MPI Applications on Aurora
by: Ibeid, Huda, et al.
Published: (2025)
by: Ibeid, Huda, et al.
Published: (2025)
Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments
by: Pruyne, Jim, et al.
Published: (2024)
by: Pruyne, Jim, et al.
Published: (2024)
MPI-Q: A Message Communication Library for Large-Scale Classical-Quantum Heterogeneous Hybrid Distributed Computing
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
Scaling Real-Time Traffic Analytics on Edge-Cloud Fabrics for City-Scale Camera Networks
by: Sharma, Akash, et al.
Published: (2026)
by: Sharma, Akash, et al.
Published: (2026)
SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods
by: Yang, Shaofeng, et al.
Published: (2026)
by: Yang, Shaofeng, et al.
Published: (2026)
Barycentric Coded Distributed Computing with Flexible Recovery Threshold for Collaborative Mobile Edge Computing
by: Qiu, Houming, et al.
Published: (2025)
by: Qiu, Houming, et al.
Published: (2025)
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
by: Zhu, Ruidong, et al.
Published: (2025)
by: Zhu, Ruidong, et al.
Published: (2025)
ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains
by: Woo, Hyein, et al.
Published: (2025)
by: Woo, Hyein, et al.
Published: (2025)
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)
by: Arif, Moiz, et al.
Published: (2026)
Similar Items
-
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD
by: Li, Youjie, et al.
Published: (2025) -
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
by: Zhang, Ruisi, et al.
Published: (2024) -
Performance Characterization of Distributed Deep Learning Strategies: A Quantitative Evaluation of DDP, FSDP, and Parameter Server Architectures on GPU Clusters
by: Ovi, Md Sultanul Islam
Published: (2025) -
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
by: Ma, Qianli, et al.
Published: (2025) -
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
by: Jin, Chao, et al.
Published: (2025)