:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Schmitz, Donatien, Rosinosky, Guillaume, Rivière, Etienne
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2505.19739
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

COoL-TEE: Client-TEE Collaboration for Resilient Distributed Search
by: Bettinger, Matthieu, et al.
Published: (2025)

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)

Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems
by: Szydlo, Tomasz, et al.
Published: (2025)

Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)

Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
by: Schieffer, Gabin, et al.
Published: (2024)

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)

Pie: Pooling CPU Memory for LLM Inference
by: Xu, Yi, et al.
Published: (2024)

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)

Towards CXL Resilience to CPU Failures
by: Psistakis, Antonis, et al.
Published: (2026)

Elastic Data Transfer Optimization with Hybrid Reinforcement Learning
by: Swargo, Rasman Mubtasim, et al.
Published: (2025)

Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
by: Wahlgren, Jacob, et al.
Published: (2025)

Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems
by: Pfister, Benjamin J. J., et al.
Published: (2024)

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications
by: Wen, Linfeng, et al.
Published: (2024)

eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization
by: Geldenhuys, Morgan, et al.
Published: (2024)

WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)

Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)

A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)

MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
by: Hu, Cunchen, et al.
Published: (2024)

ElasWave: An Elastic-Native System for Scalable Hybrid-Parallel Training
by: Kang, Xueze, et al.
Published: (2025)

TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
by: Ding, Zhimin, et al.
Published: (2024)

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing
by: Agnihotri, Pratyush, et al.
Published: (2025)

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
by: Zhong, Yinmin, et al.
Published: (2025)

Efficient Column-Wise N:M Pruning on RISC-V CPU
by: Chu, Chi-Wei, et al.
Published: (2025)

A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies
by: Xiong, Yuqing
Published: (2022)

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
by: Henning, Sören, et al.
Published: (2024)

Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling
by: Fang, Bruce, et al.
Published: (2025)

ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)

Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)

MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix Unit
by: Wang, Yinuo, et al.
Published: (2025)

Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC
by: Brown, Nick, et al.
Published: (2024)

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)

Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD
by: Wei, Jia, et al.
Published: (2024)

madupite: A High-Performance Distributed Solver for Large-Scale Markov Decision Processes
by: Gargiani, Matilde, et al.
Published: (2025)

Towards Fine-Grained Scalability for Stateful Stream Processing Systems
by: Qing, Yunfan, et al.
Published: (2025)

APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
by: Fan, Jiakun, et al.
Published: (2025)