Saved in:
| Main Authors: | Sharif, Mayira, Han, Guangzeng, Liu, Weisi, Huang, Xiaolei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.14786 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025)
by: Zhang, Kunming, et al.
Published: (2025)
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
by: Huang, Jiajun, et al.
Published: (2023)
by: Huang, Jiajun, et al.
Published: (2023)
Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025)
by: Darzi, Erfan, et al.
Published: (2025)
GPU Programming for AI Workflow Development on AWS SageMaker: An Instructional Approach
by: Srinivasan, Sriram, et al.
Published: (2025)
by: Srinivasan, Sriram, et al.
Published: (2025)
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
by: Mo, Zizhao, et al.
Published: (2024)
by: Mo, Zizhao, et al.
Published: (2024)
DeepOps & SLURM: Your GPU Cluster Guide
by: Majee, Arindam
Published: (2024)
by: Majee, Arindam
Published: (2024)
An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)
by: Zhang, Mingjun, et al.
Published: (2025)
Cronus: Efficient LLM inference on Heterogeneous GPU Clusters via Partially Disaggregated Prefill
by: Liu, Yunzhao, et al.
Published: (2025)
by: Liu, Yunzhao, et al.
Published: (2025)
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)
by: Liang, Antian, et al.
Published: (2025)
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025)
by: Guo, Runsheng Benson, et al.
Published: (2025)
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024)
by: Guo, Runsheng Benson, et al.
Published: (2024)
A Practical GPU-Accelerated Implementation of Orthogonal Matching Pursuit
by: Lubonja, Ariel, et al.
Published: (2024)
by: Lubonja, Ariel, et al.
Published: (2024)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
The Energy Cost of Execution-Idle in GPU Clusters
by: Lei, Yiran, et al.
Published: (2026)
by: Lei, Yiran, et al.
Published: (2026)
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)
by: Mo, Zizhao, et al.
Published: (2025)
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)
by: Zhang, Shiwei, et al.
Published: (2024)
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
by: Jain, Rutwik, et al.
Published: (2024)
by: Jain, Rutwik, et al.
Published: (2024)
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)
by: Chang, Zihan, et al.
Published: (2024)
TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU
by: Wu, Shixun, et al.
Published: (2025)
by: Wu, Shixun, et al.
Published: (2025)
ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels
by: Sul, Stuart H., et al.
Published: (2025)
by: Sul, Stuart H., et al.
Published: (2025)
GFS: A Preemption-aware Scheduling Framework for GPU Clusters with Predictive Spot Instance Management
by: Duan, Jiaang, et al.
Published: (2025)
by: Duan, Jiaang, et al.
Published: (2025)
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
by: Luo, Yizhou, et al.
Published: (2024)
by: Luo, Yizhou, et al.
Published: (2024)
GPZ: GPU-Accelerated Lossy Compressor for Particle Data
by: Li, Ruoyu, et al.
Published: (2025)
by: Li, Ruoyu, et al.
Published: (2025)
HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
by: Gu, Jianfeng, et al.
Published: (2025)
by: Gu, Jianfeng, et al.
Published: (2025)
PPipe: Efficient Video Analytics Serving on Heterogeneous GPU Clusters via Pool-Based Pipeline Parallelism
by: Kong, Z. Jonny, et al.
Published: (2025)
by: Kong, Z. Jonny, et al.
Published: (2025)
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
by: Zhang, Haolin, et al.
Published: (2025)
by: Zhang, Haolin, et al.
Published: (2025)
Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)
by: Yuan, Zhehu, et al.
Published: (2024)
Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management
by: Phung, Thanh Son, et al.
Published: (2025)
by: Phung, Thanh Son, et al.
Published: (2025)
The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries
by: Amoros, Oscar, et al.
Published: (2025)
by: Amoros, Oscar, et al.
Published: (2025)
Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)
by: Eynaliyev, Rustam, et al.
Published: (2025)
Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management
by: Phung, Thanh Son, et al.
Published: (2025)
by: Phung, Thanh Son, et al.
Published: (2025)
Characterization-Guided GPU Fault Resilience in NVIDIA MPS
by: Liu, Rixin, et al.
Published: (2026)
by: Liu, Rixin, et al.
Published: (2026)
Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
by: Jain, Rutwik, et al.
Published: (2026)
by: Jain, Rutwik, et al.
Published: (2026)
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
Performance Characterization of Distributed Deep Learning Strategies: A Quantitative Evaluation of DDP, FSDP, and Parameter Server Architectures on GPU Clusters
by: Ovi, Md Sultanul Islam
Published: (2025)
by: Ovi, Md Sultanul Islam
Published: (2025)
Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
GPU-Accelerated Batch-Dynamic Subgraph Matching
by: Qiu, Linshan, et al.
Published: (2024)
by: Qiu, Linshan, et al.
Published: (2024)
TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024)
by: Wu, Shixun, et al.
Published: (2024)
Similar Items
-
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025) -
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
by: Huang, Jiajun, et al.
Published: (2023) -
Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025) -
GPU Programming for AI Workflow Development on AWS SageMaker: An Instructional Approach
by: Srinivasan, Sriram, et al.
Published: (2025) -
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)