Saved in:
| Main Authors: | Wu, Xin, Teng, Fei, Li, Xingwang, Zheng, Bin, Duan, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25378 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
by: Jain, Rutwik, et al.
Published: (2024)
by: Jain, Rutwik, et al.
Published: (2024)
MERBIT: A GPU-Based SpMV Method for Iterative Workloads
by: Zhang, Qi, et al.
Published: (2026)
by: Zhang, Qi, et al.
Published: (2026)
CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
by: Stoyanov, Radostin, et al.
Published: (2025)
by: Stoyanov, Radostin, et al.
Published: (2025)
Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads
by: Scheinert, Dominik, et al.
Published: (2026)
by: Scheinert, Dominik, et al.
Published: (2026)
Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)
by: Jain, Rutwik, et al.
Published: (2026)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)
by: Golden, Alicia, et al.
Published: (2025)
Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
by: Dongare, Shruti, et al.
Published: (2025)
by: Dongare, Shruti, et al.
Published: (2025)
Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters
by: Iserte, Sergio, et al.
Published: (2025)
by: Iserte, Sergio, et al.
Published: (2025)
PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism
by: Kong, Z. Jonny, et al.
Published: (2025)
by: Kong, Z. Jonny, et al.
Published: (2025)
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025)
by: Luo, Ziyue, et al.
Published: (2025)
Characterizing Production GPU Workloads using System-wide Telemetry Data
by: Cankur, Onur, et al.
Published: (2025)
by: Cankur, Onur, et al.
Published: (2025)
KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling
by: Zhang, Guilin, et al.
Published: (2025)
by: Zhang, Guilin, et al.
Published: (2025)
Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning
by: Yu, Liangkun, et al.
Published: (2024)
by: Yu, Liangkun, et al.
Published: (2024)
Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill
by: Liu, Yunzhao, et al.
Published: (2025)
by: Liu, Yunzhao, et al.
Published: (2025)
MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026)
by: Xue, Chunyu, et al.
Published: (2026)
Matrix representation and GPU-optimized parallel B-spline computing
by: Wu, Jiayu, et al.
Published: (2025)
by: Wu, Jiayu, et al.
Published: (2025)
Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training
by: Chen, Chang, et al.
Published: (2025)
by: Chen, Chang, et al.
Published: (2025)
An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs
by: Ting, Hsu-Tzu, et al.
Published: (2025)
by: Ting, Hsu-Tzu, et al.
Published: (2025)
ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production
by: Xiang, Yuxing, et al.
Published: (2025)
by: Xiang, Yuxing, et al.
Published: (2025)
Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges
by: Stavrinides, Georgios L., et al.
Published: (2025)
by: Stavrinides, Georgios L., et al.
Published: (2025)
An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)
by: Zhang, Mingjun, et al.
Published: (2025)
FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
by: Guo, Xianwen, et al.
Published: (2025)
by: Guo, Xianwen, et al.
Published: (2025)
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
by: Lee, Munkyu, et al.
Published: (2024)
by: Lee, Munkyu, et al.
Published: (2024)
GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion
by: Yang, Yiwei, et al.
Published: (2026)
by: Yang, Yiwei, et al.
Published: (2026)
Large Scale Multi-GPU Based Parallel Traffic Simulation for Accelerated Traffic Assignment and Propagation
by: Jiang, Xuan, et al.
Published: (2024)
by: Jiang, Xuan, et al.
Published: (2024)
High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
by: Pilliat, Emmanuel
Published: (2026)
by: Pilliat, Emmanuel
Published: (2026)
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)
by: Mo, Zizhao, et al.
Published: (2025)
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025)
by: Darzi, Erfan, et al.
Published: (2025)
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
by: Luo, Yizhou, et al.
Published: (2024)
by: Luo, Yizhou, et al.
Published: (2024)
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads
by: Zojer, Patrick, et al.
Published: (2026)
by: Zojer, Patrick, et al.
Published: (2026)
Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads
by: Yildiz, Mert, et al.
Published: (2025)
by: Yildiz, Mert, et al.
Published: (2025)
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
by: Duan, Jiaang, et al.
Published: (2025)
by: Duan, Jiaang, et al.
Published: (2025)
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
by: Zheng, Size, et al.
Published: (2025)
by: Zheng, Size, et al.
Published: (2025)
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)
by: Shu, Zhihao, et al.
Published: (2026)
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)
by: Liang, Antian, et al.
Published: (2025)
Crossword: Adaptive Consensus for Dynamic Data-Heavy Workloads
by: Hu, Guanzhou, et al.
Published: (2025)
by: Hu, Guanzhou, et al.
Published: (2025)
Towards Cloud Efficiency with Large-scale Workload Characterization
by: Parayil, Anjaly, et al.
Published: (2024)
by: Parayil, Anjaly, et al.
Published: (2024)
Similar Items
-
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
by: Jain, Rutwik, et al.
Published: (2024) -
MERBIT: A GPU-Based SpMV Method for Iterative Workloads
by: Zhang, Qi, et al.
Published: (2026) -
CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
by: Stoyanov, Radostin, et al.
Published: (2025) -
Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads
by: Scheinert, Dominik, et al.
Published: (2026) -
Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)