Saved in:
Bibliographic Details
Main Authors: Poptani, Akash, Khadem, Alireza, Mahlke, Scott, Miller, Jonah, Dolence, Joshua, Das, Reetuparna
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.19701
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured AMR benchmark, on CPU-GPU systems. We show that smaller mesh blocks and deeper AMR levels degrade GPU performance due to increased communication, serial overheads, and inefficient GPU utilization. Through detailed profiling, we identify inefficiencies, low occupancy, and memory access bottlenecks. We further analyze rank scalability and memory constraints, and propose optimizations to improve GPU throughput and reduce memory footprint. Our insights can inform future AMR deployments on Department of Energy's upcoming heterogeneous supercomputers.