:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guo, Zizheng, Liu, Haichuan, Shi, Xizhe, Hua, Shenglu, Zhang, Zuodong, Zhao, Chunyuan, Wang, Runsheng, Lin, Yibo
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2511.11660
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)

Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
by: Guo, Runsheng Benson, et al.
Published: (2024)

Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
by: Guo, Runsheng Benson, et al.
Published: (2025)

Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
by: Zhong, Shuzhang, et al.
Published: (2025)

Co-Design and Evaluation of a CPU-Free MPI GPU Communication Abstraction and Implementation
by: Bridges, Patrick G., et al.
Published: (2026)

A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)

Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning
by: Saba, Issa, et al.
Published: (2024)

TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
by: Ding, Zhimin, et al.
Published: (2024)

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)

Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
by: Wahlgren, Jacob, et al.
Published: (2025)

APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
by: Fan, Jiakun, et al.
Published: (2025)

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)

Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
by: He, Yongchao, et al.
Published: (2025)

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
by: Schieffer, Gabin, et al.
Published: (2024)

A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies
by: Xiong, Yuqing
Published: (2022)

Cost-Performance Analysis: A Comparative Study of CPU-Based Serverless and GPU-Based Training Architectures
by: Barrak, Amine, et al.
Published: (2025)

Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference
by: Chung, Euijun, et al.
Published: (2026)

Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous GPU Clusters
by: Zhang, WenZheng, et al.
Published: (2024)

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
by: Liang, Antian, et al.
Published: (2025)

AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments
by: Chen, Kefu, et al.
Published: (2026)

AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones
by: Zhao, Xinkui, et al.
Published: (2025)

Comparing CPU and GPU compute of PERMANOVA on MI300A
by: Sfiligoi, Igor
Published: (2025)

Optimizing Task Scheduling in Heterogeneous Computing Environments: A Comparative Analysis of CPU, GPU, and ASIC Platforms Using E2C Simulator
by: Mohammadjafari, Ali, et al.
Published: (2024)

Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters
by: Mo, Zizhao, et al.
Published: (2024)

Towards CXL Resilience to CPU Failures
by: Psistakis, Antonis, et al.
Published: (2026)

xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads
by: Shi, Jiabo, et al.
Published: (2025)

A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)

Parallel CPU- and GPU-based connected component algorithms for event building for hybrid pixel detectors
by: Čelko, Tomáš, et al.
Published: (2024)

Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution
by: Li, Zhuojin, et al.
Published: (2025)

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)

WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)

Optimizing Allreduce Operations for Modern Heterogeneous Architectures with Multiple Processes per GPU
by: Adams, Michael, et al.
Published: (2025)

Hetis: Serving LLMs in Heterogeneous GPU Clusters with Fine-grained and Dynamic Parallelism
by: Mo, Zizhao, et al.
Published: (2025)

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)