Saved in:
| Main Authors: | Guo, Zizheng, Liu, Haichuan, Shi, Xizhe, Hua, Shenglu, Zhang, Zuodong, Zhao, Chunyuan, Wang, Runsheng, Lin, Yibo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.11660 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)
by: Schieffer, Gabin, et al.
Published: (2026)
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024)
by: Guo, Runsheng Benson, et al.
Published: (2024)
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025)
by: Guo, Runsheng Benson, et al.
Published: (2025)
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)
by: Qiao, Tong, et al.
Published: (2025)
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
by: Zhong, Shuzhang, et al.
Published: (2025)
by: Zhong, Shuzhang, et al.
Published: (2025)
Co-Design and Evaluation of a CPU-Free MPI GPU Communication Abstraction and Implementation
by: Bridges, Patrick G., et al.
Published: (2026)
by: Bridges, Patrick G., et al.
Published: (2026)
A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)
by: Lin, Yi-Chien, et al.
Published: (2024)
Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)
by: Eynaliyev, Rustam, et al.
Published: (2025)
Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning
by: Saba, Issa, et al.
Published: (2024)
by: Saba, Issa, et al.
Published: (2024)
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
by: Ding, Zhimin, et al.
Published: (2024)
by: Ding, Zhimin, et al.
Published: (2024)
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)
by: Li, Zhonggen, et al.
Published: (2025)
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)
by: Yi, Xinyao
Published: (2024)
Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
by: Wahlgren, Jacob, et al.
Published: (2025)
by: Wahlgren, Jacob, et al.
Published: (2025)
APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
by: Fan, Jiakun, et al.
Published: (2025)
by: Fan, Jiakun, et al.
Published: (2025)
Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)
by: Mo, Zizhao, et al.
Published: (2026)
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)
by: Maurya, Avinash, et al.
Published: (2024)
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)
by: Huang, En-Ming, et al.
Published: (2025)
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
by: He, Yongchao, et al.
Published: (2025)
by: He, Yongchao, et al.
Published: (2025)
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
by: Schieffer, Gabin, et al.
Published: (2024)
by: Schieffer, Gabin, et al.
Published: (2024)
A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies
by: Xiong, Yuqing
Published: (2022)
by: Xiong, Yuqing
Published: (2022)
Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
by: Barrak, Amine, et al.
Published: (2025)
by: Barrak, Amine, et al.
Published: (2025)
Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference
by: Chung, Euijun, et al.
Published: (2026)
by: Chung, Euijun, et al.
Published: (2026)
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)
by: Zhang, WenZheng, et al.
Published: (2024)
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)
by: Liang, Antian, et al.
Published: (2025)
AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments
by: Chen, Kefu, et al.
Published: (2026)
by: Chen, Kefu, et al.
Published: (2026)
AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones
by: Zhao, Xinkui, et al.
Published: (2025)
by: Zhao, Xinkui, et al.
Published: (2025)
Comparing CPU and GPU compute of PERMANOVA on MI300A
by: Sfiligoi, Igor
Published: (2025)
by: Sfiligoi, Igor
Published: (2025)
Optimizing Task Scheduling in Heterogeneous Computing Environments: A Comparative Analysis of CPU, GPU, and ASIC Platforms Using E2C Simulator
by: Mohammadjafari, Ali, et al.
Published: (2024)
by: Mohammadjafari, Ali, et al.
Published: (2024)
Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
by: Mo, Zizhao, et al.
Published: (2024)
by: Mo, Zizhao, et al.
Published: (2024)
Towards CXL Resilience to CPU Failures
by: Psistakis, Antonis, et al.
Published: (2026)
by: Psistakis, Antonis, et al.
Published: (2026)
xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads
by: Shi, Jiabo, et al.
Published: (2025)
by: Shi, Jiabo, et al.
Published: (2025)
A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)
by: Futuhi, Ehsan, et al.
Published: (2025)
Parallel CPU- and GPU-based connected component algorithms for event building for hybrid pixel detectors
by: Čelko, Tomáš, et al.
Published: (2024)
by: Čelko, Tomáš, et al.
Published: (2024)
Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution
by: Li, Zhuojin, et al.
Published: (2025)
by: Li, Zhuojin, et al.
Published: (2025)
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)
by: Yang, Ruijia, et al.
Published: (2026)
WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
Optimizing Allreduce Operations for Modern Heterogeneous Architectures with Multiple Processes per GPU
by: Adams, Michael, et al.
Published: (2025)
by: Adams, Michael, et al.
Published: (2025)
Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)
by: Mo, Zizhao, et al.
Published: (2025)
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)
by: Zhang, Shiwei, et al.
Published: (2024)
Similar Items
-
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026) -
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024) -
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025) -
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025) -
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
by: Zhong, Shuzhang, et al.
Published: (2025)