Saved in:
Bibliographic Details
Main Authors: Abdulah, Sameh, Baker, Allison H., Bosilca, George, Cao, Qinglei, Castruccio, Stefano, Genton, Marc G., Keyes, David E., Khalid, Zubair, Ltaief, Hatem, Song, Yan, Stenchikov, Georgiy L., Sun, Ying
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.04440
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909284270342144
author Abdulah, Sameh
Baker, Allison H.
Bosilca, George
Cao, Qinglei
Castruccio, Stefano
Genton, Marc G.
Keyes, David E.
Khalid, Zubair
Ltaief, Hatem
Song, Yan
Stenchikov, Georgiy L.
Sun, Ying
author_facet Abdulah, Sameh
Baker, Allison H.
Bosilca, George
Cao, Qinglei
Castruccio, Stefano
Genton, Marc G.
Keyes, David E.
Khalid, Zubair
Ltaief, Hatem
Song, Yan
Stenchikov, Georgiy L.
Sun, Ying
contents We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fidelity and granularity of climate emulation, achieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) in space. Our emulator, trained on 318 billion hourly temperature data points from a 35-year and 31 billion daily data points from an 83-year global simulation ensemble, generates statistically consistent climate emulations. We extend linear solver software to mixed-precision arithmetic GPUs, applying different precisions within a single solver to adapt to different correlation strengths. The PaRSEC runtime system supports efficient parallel matrix operations by optimizing the dynamic balance between computation, communication, and memory requirements. Our BLAS3-rich code is optimized for systems equipped with four different families and generations of GPUs, scaling well to achieve 0.976 EFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) of Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.
format Preprint
id arxiv_https___arxiv_org_abs_2408_04440
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators
Abdulah, Sameh
Baker, Allison H.
Bosilca, George
Cao, Qinglei
Castruccio, Stefano
Genton, Marc G.
Keyes, David E.
Khalid, Zubair
Ltaief, Hatem
Song, Yan
Stenchikov, Georgiy L.
Sun, Ying
Computation
We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fidelity and granularity of climate emulation, achieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) in space. Our emulator, trained on 318 billion hourly temperature data points from a 35-year and 31 billion daily data points from an 83-year global simulation ensemble, generates statistically consistent climate emulations. We extend linear solver software to mixed-precision arithmetic GPUs, applying different precisions within a single solver to adapt to different correlation strengths. The PaRSEC runtime system supports efficient parallel matrix operations by optimizing the dynamic balance between computation, communication, and memory requirements. Our BLAS3-rich code is optimized for systems equipped with four different families and generations of GPUs, scaling well to achieve 0.976 EFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) of Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.
title Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators
topic Computation
url https://arxiv.org/abs/2408.04440