Saved in:
| Main Authors: | Schmitz, Donatien, Rosinosky, Guillaume, Rivière, Etienne |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.19739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
COoL-TEE: Client-TEE Collaboration for Resilient Distributed Search
by: Bettinger, Matthieu, et al.
Published: (2025)
by: Bettinger, Matthieu, et al.
Published: (2025)
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)
by: Maurya, Avinash, et al.
Published: (2024)
Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems
by: Szydlo, Tomasz, et al.
Published: (2025)
by: Szydlo, Tomasz, et al.
Published: (2025)
Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024)
by: Raffin, Guillaume, et al.
Published: (2024)
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)
by: Li, Zhonggen, et al.
Published: (2025)
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)
by: Huang, En-Ming, et al.
Published: (2025)
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
by: Schieffer, Gabin, et al.
Published: (2024)
by: Schieffer, Gabin, et al.
Published: (2024)
Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)
by: Mo, Zizhao, et al.
Published: (2026)
Pie: Pooling CPU Memory for LLM Inference
by: Xu, Yi, et al.
Published: (2024)
by: Xu, Yi, et al.
Published: (2024)
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
Towards CXL Resilience to CPU Failures
by: Psistakis, Antonis, et al.
Published: (2026)
by: Psistakis, Antonis, et al.
Published: (2026)
Elastic Data Transfer Optimization with Hybrid Reinforcement Learning
by: Swargo, Rasman Mubtasim, et al.
Published: (2025)
by: Swargo, Rasman Mubtasim, et al.
Published: (2025)
Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
by: Wahlgren, Jacob, et al.
Published: (2025)
by: Wahlgren, Jacob, et al.
Published: (2025)
Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems
by: Pfister, Benjamin J. J., et al.
Published: (2024)
by: Pfister, Benjamin J. J., et al.
Published: (2024)
StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications
by: Wen, Linfeng, et al.
Published: (2024)
by: Wen, Linfeng, et al.
Published: (2024)
eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)
by: Xu, Jiale, et al.
Published: (2025)
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
by: Zhuang, Chen, et al.
Published: (2024)
by: Zhuang, Chen, et al.
Published: (2024)
Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization
by: Geldenhuys, Morgan, et al.
Published: (2024)
by: Geldenhuys, Morgan, et al.
Published: (2024)
WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)
by: Eynaliyev, Rustam, et al.
Published: (2025)
A Unified CPU-GPU Protocol for GNN Training
by: Lin, Yi-Chien, et al.
Published: (2024)
by: Lin, Yi-Chien, et al.
Published: (2024)
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
by: Hu, Cunchen, et al.
Published: (2024)
by: Hu, Cunchen, et al.
Published: (2024)
ElasWave: An Elastic-Native System for Scalable Hybrid-Parallel Training
by: Kang, Xueze, et al.
Published: (2025)
by: Kang, Xueze, et al.
Published: (2025)
TURNIP: A "Nondeterministic" GPU Runtime with CPU RAM Offload
by: Ding, Zhimin, et al.
Published: (2024)
by: Ding, Zhimin, et al.
Published: (2024)
PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing
by: Agnihotri, Pratyush, et al.
Published: (2025)
by: Agnihotri, Pratyush, et al.
Published: (2025)
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
by: Zhong, Yinmin, et al.
Published: (2025)
by: Zhong, Yinmin, et al.
Published: (2025)
Efficient Column-Wise N:M Pruning on RISC-V CPU
by: Chu, Chi-Wei, et al.
Published: (2025)
by: Chu, Chi-Wei, et al.
Published: (2025)
A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies
by: Xiong, Yuqing
Published: (2022)
by: Xiong, Yuqing
Published: (2022)
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
by: Henning, Sören, et al.
Published: (2024)
by: Henning, Sören, et al.
Published: (2024)
Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling
by: Fang, Bruce, et al.
Published: (2025)
by: Fang, Bruce, et al.
Published: (2025)
ElasticMoE: An Efficient Auto Scaling Method for Mixture-of-Experts Models
by: Singh, Gursimran, et al.
Published: (2025)
by: Singh, Gursimran, et al.
Published: (2025)
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
by: Qiao, Tong, et al.
Published: (2025)
by: Qiao, Tong, et al.
Published: (2025)
MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix Unit
by: Wang, Yinuo, et al.
Published: (2025)
by: Wang, Yinuo, et al.
Published: (2025)
Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC
by: Brown, Nick, et al.
Published: (2024)
by: Brown, Nick, et al.
Published: (2024)
Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
by: Schieffer, Gabin, et al.
Published: (2026)
by: Schieffer, Gabin, et al.
Published: (2026)
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)
by: Yi, Xinyao
Published: (2024)
Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD
by: Wei, Jia, et al.
Published: (2024)
by: Wei, Jia, et al.
Published: (2024)
madupite: A High-Performance Distributed Solver for Large-Scale Markov Decision Processes
by: Gargiani, Matilde, et al.
Published: (2025)
by: Gargiani, Matilde, et al.
Published: (2025)
Towards Fine-Grained Scalability for Stateful Stream Processing Systems
by: Qing, Yunfan, et al.
Published: (2025)
by: Qing, Yunfan, et al.
Published: (2025)
APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
by: Fan, Jiakun, et al.
Published: (2025)
by: Fan, Jiakun, et al.
Published: (2025)
Similar Items
-
COoL-TEE: Client-TEE Collaboration for Resilient Distributed Search
by: Bettinger, Matthieu, et al.
Published: (2025) -
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024) -
Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems
by: Szydlo, Tomasz, et al.
Published: (2025) -
Dissecting the software-based measurement of CPU energy consumption: a comparative analysis
by: Raffin, Guillaume, et al.
Published: (2024) -
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)