Saved in:
| Main Authors: | Zhu, Zhu, Sun, Yu, Parakal, Dhatri, Fang, Bo, Farrell, Steven, Bauer, Gregory H., Bode, Brett, Foster, Ian T., Papka, Michael E., Gropp, William, Zhang, Zhao, Yang, Lishan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.03513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes
by: Ma, Xiaolong, et al.
Published: (2024)
by: Ma, Xiaolong, et al.
Published: (2024)
Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs
by: Cui, Shengkun, et al.
Published: (2025)
by: Cui, Shengkun, et al.
Published: (2025)
Object Proxy Patterns for Accelerating Distributed Applications
by: Pauloski, J. Gregory, et al.
Published: (2024)
by: Pauloski, J. Gregory, et al.
Published: (2024)
More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
CUTHERMO: Understanding GPU Memory Inefficiencies with Heat Map Profiling
by: Zhao, Yanbo, et al.
Published: (2025)
by: Zhao, Yanbo, et al.
Published: (2025)
Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics
by: Austin, Allison, et al.
Published: (2026)
by: Austin, Allison, et al.
Published: (2026)
Exploring Uncore Frequency Scaling for Heterogeneous Computing
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
An Incremental Multi-Level, Multi-Scale Approach to Assessment of Multifidelity HPC Systems
by: Shilpika, Shilpika, et al.
Published: (2025)
by: Shilpika, Shilpika, et al.
Published: (2025)
Computational Grids
by: Foster, Ian, et al.
Published: (2025)
by: Foster, Ian, et al.
Published: (2025)
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)
by: Maurya, Avinash, et al.
Published: (2024)
Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)
by: Yuan, Zhehu, et al.
Published: (2024)
Accelerating Python Applications with Dask and ProxyStore
by: Pauloski, J. Gregory, et al.
Published: (2024)
by: Pauloski, J. Gregory, et al.
Published: (2024)
Byzantine-Tolerant Consensus in GPU-Inspired Shared Memory
by: Georgiou, Chryssis, et al.
Published: (2025)
by: Georgiou, Chryssis, et al.
Published: (2025)
A Real-Time Digital Twin for Adaptive Scheduling
by: Zhang, Yihe, et al.
Published: (2025)
by: Zhang, Yihe, et al.
Published: (2025)
Understanding GPU Triggering APIs for MPI+X Communication
by: Bridges, Patrick G., et al.
Published: (2024)
by: Bridges, Patrick G., et al.
Published: (2024)
Understanding GPU Resource Interference One Level Deeper
by: Elvinger, Paul, et al.
Published: (2025)
by: Elvinger, Paul, et al.
Published: (2025)
PilotANN: Memory-Bounded GPU Acceleration for Vector Search
by: Gui, Yuntao, et al.
Published: (2025)
by: Gui, Yuntao, et al.
Published: (2025)
Coordinated Power Management on Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
The Landscape of GPU-Centric Communication
by: Unat, Didem, et al.
Published: (2024)
by: Unat, Didem, et al.
Published: (2024)
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
Experiences with Model Context Protocol Servers for Science and High Performance Computing
by: Pan, Haochen, et al.
Published: (2025)
by: Pan, Haochen, et al.
Published: (2025)
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)
by: He, Jinghai, et al.
Published: (2024)
DuaLip-GPU Technical Report
by: Dexter, Gregory, et al.
Published: (2026)
by: Dexter, Gregory, et al.
Published: (2026)
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)
by: Li, Zhonggen, et al.
Published: (2025)
Agora: Bridging the GPU Cloud Resource-Price Disconnect
by: McDougall, Ian, et al.
Published: (2025)
by: McDougall, Ian, et al.
Published: (2025)
GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations
by: Yousefzadeh-Asl-Miandoab, Ehsan, et al.
Published: (2026)
by: Yousefzadeh-Asl-Miandoab, Ehsan, et al.
Published: (2026)
AQUA: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU Domains
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)
Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
by: Schieffer, Gabin, et al.
Published: (2024)
by: Schieffer, Gabin, et al.
Published: (2024)
GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS
by: Kamatar, Alok, et al.
Published: (2024)
by: Kamatar, Alok, et al.
Published: (2024)
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling
by: Xu, Dong, et al.
Published: (2026)
by: Xu, Dong, et al.
Published: (2026)
MRSch: Multi-Resource Scheduling for HPC
by: Li, Boyang, et al.
Published: (2024)
by: Li, Boyang, et al.
Published: (2024)
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)
by: Chang, Zihan, et al.
Published: (2024)
DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference
by: Lin, Shouxu, et al.
Published: (2026)
by: Lin, Shouxu, et al.
Published: (2026)
Core Hours and Carbon Credits: Incentivizing Sustainability in HPC
by: Kamatar, Alok, et al.
Published: (2025)
by: Kamatar, Alok, et al.
Published: (2025)
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
by: Recasens, Pol G., et al.
Published: (2025)
by: Recasens, Pol G., et al.
Published: (2025)
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)
by: Shu, Zhihao, et al.
Published: (2026)
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)
by: Li, Zhonggen, et al.
Published: (2024)
CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
by: Stoyanov, Radostin, et al.
Published: (2025)
by: Stoyanov, Radostin, et al.
Published: (2025)
Similar Items
-
MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes
by: Ma, Xiaolong, et al.
Published: (2024) -
Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs
by: Cui, Shengkun, et al.
Published: (2025) -
Object Proxy Patterns for Accelerating Distributed Applications
by: Pauloski, J. Gregory, et al.
Published: (2024) -
More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing
by: Zheng, Zhong, et al.
Published: (2025) -
Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)