:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Xin, Teng, Fei, Li, Xingwang, Zheng, Bin, Duan, Qiang
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2603.25378
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
by: Jain, Rutwik, et al.
Published: (2024)

MERBIT: A GPU-Based SpMV Method for Iterative Workloads
by: Zhang, Qi, et al.
Published: (2026)

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
by: Stoyanov, Radostin, et al.
Published: (2025)

Beyond Microservices: Testing Web-Scale RCA Methods on GPU-Driven LLM Workloads
by: Scheinert, Dominik, et al.
Published: (2026)

Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)

Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
by: Dongare, Shruti, et al.
Published: (2025)

Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters
by: Iserte, Sergio, et al.
Published: (2025)

PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism
by: Kong, Z. Jonny, et al.
Published: (2025)

Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025)

Characterizing Production GPU Workloads using System-wide Telemetry Data
by: Cankur, Onur, et al.
Published: (2025)

KIS-S: A GPU-Aware Kubernetes Inference Simulator with RL-Based Auto-Scaling
by: Zhang, Guilin, et al.
Published: (2025)

Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning
by: Yu, Liangkun, et al.
Published: (2024)

Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill
by: Liu, Yunzhao, et al.
Published: (2025)

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production
by: Xue, Chunyu, et al.
Published: (2026)

Matrix representation and GPU-optimized parallel B-spline computing
by: Wu, Jiayu, et al.
Published: (2025)

Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training
by: Chen, Chang, et al.
Published: (2025)

An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs
by: Ting, Hsu-Tzu, et al.
Published: (2025)

ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production
by: Xiang, Yuxing, et al.
Published: (2025)

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges
by: Stavrinides, Georgios L., et al.
Published: (2025)

An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)

FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
by: Guo, Xianwen, et al.
Published: (2025)

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
by: Lee, Munkyu, et al.
Published: (2024)

GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion
by: Yang, Yiwei, et al.
Published: (2026)

Large Scale Multi-GPU Based Parallel Traffic Simulation for Accelerated Traffic Assignment and Propagation
by: Jiang, Xuan, et al.
Published: (2024)

High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
by: Pilliat, Emmanuel
Published: (2026)

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)

Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)

Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025)

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
by: Luo, Yizhou, et al.
Published: (2024)

Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
by: Wang, Tianyu, et al.
Published: (2024)

Evaluating Malleable Job Scheduling in HPC Clusters using Real-World Workloads
by: Zojer, Patrick, et al.
Published: (2026)

Dispatching Odyssey: Exploring Performance in Computing Clusters under Real-world Workloads
by: Yildiz, Mert, et al.
Published: (2025)

GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
by: Duan, Jiaang, et al.
Published: (2025)

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives
by: Zheng, Size, et al.
Published: (2025)

FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)

Crossword: Adaptive Consensus for Dynamic Data-Heavy Workloads
by: Hu, Guanzhou, et al.
Published: (2025)

Towards Cloud Efficiency with Large-scale Workload Characterization
by: Parayil, Anjaly, et al.
Published: (2024)