:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Zhu, Sun, Yu, Parakal, Dhatri, Fang, Bo, Farrell, Steven, Bauer, Gregory H., Bode, Brett, Foster, Ian T., Papka, Michael E., Gropp, William, Zhang, Zhao, Yang, Lishan
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2508.03513
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes
by: Ma, Xiaolong, et al.
Published: (2024)

Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs
by: Cui, Shengkun, et al.
Published: (2025)

Object Proxy Patterns for Accelerating Distributed Applications
by: Pauloski, J. Gregory, et al.
Published: (2024)

More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing
by: Zheng, Zhong, et al.
Published: (2025)

Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)

EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2026)

CUTHERMO: Understanding GPU Memory Inefficiencies with Heat Map Profiling
by: Zhao, Yanbo, et al.
Published: (2025)

Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics
by: Austin, Allison, et al.
Published: (2026)

Exploring Uncore Frequency Scaling for Heterogeneous Computing
by: Zheng, Zhong, et al.
Published: (2025)

An Incremental Multi-Level, Multi-Scale Approach to Assessment of Multifidelity HPC Systems
by: Shilpika, Shilpika, et al.
Published: (2025)

Computational Grids
by: Foster, Ian, et al.
Published: (2025)

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)

Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)

Accelerating Python Applications with Dask and ProxyStore
by: Pauloski, J. Gregory, et al.
Published: (2024)

Byzantine-Tolerant Consensus in GPU-Inspired Shared Memory
by: Georgiou, Chryssis, et al.
Published: (2025)

A Real-Time Digital Twin for Adaptive Scheduling
by: Zhang, Yihe, et al.
Published: (2025)

Understanding GPU Triggering APIs for MPI+X Communication
by: Bridges, Patrick G., et al.
Published: (2024)

Understanding GPU Resource Interference One Level Deeper
by: Elvinger, Paul, et al.
Published: (2025)

PilotANN: Memory-Bounded GPU Acceleration for Vector Search
by: Gui, Yuntao, et al.
Published: (2025)

Coordinated Power Management on Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2025)

The Landscape of GPU-Centric Communication
by: Unat, Didem, et al.
Published: (2024)

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)

Experiences with Model Context Protocol Servers for Science and High Performance Computing
by: Pan, Haochen, et al.
Published: (2025)

A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)

DuaLip-GPU Technical Report
by: Dexter, Gregory, et al.
Published: (2026)

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)

Agora: Bridging the GPU Cloud Resource-Price Disconnect
by: McDougall, Ian, et al.
Published: (2025)

GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations
by: Yousefzadeh-Asl-Miandoab, Ehsan, et al.
Published: (2026)

AQUA: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU Domains
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
by: Schieffer, Gabin, et al.
Published: (2024)

GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS
by: Kamatar, Alok, et al.
Published: (2024)

CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling
by: Xu, Dong, et al.
Published: (2026)

MRSch: Multi-Resource Scheduling for HPC
by: Li, Boyang, et al.
Published: (2024)

Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference
by: Lin, Shouxu, et al.
Published: (2026)

Core Hours and Carbon Credits: Incentivizing Sustainability in HPC
by: Kamatar, Alok, et al.
Published: (2025)

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
by: Recasens, Pol G., et al.
Published: (2025)

FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)

HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
by: Stoyanov, Radostin, et al.
Published: (2025)