Saved in:
Bibliographic Details
Main Authors: Firoz, Jesun, Pellegrini, Franco, Geiger, Mario, Hsu, Darren, Bilbrey, Jenna A., Chou, Han-Yi, Stadler, Maximilian, Hoehnerbach, Markus, Wang, Tingyu, Lin, Dejun, Kucukbenli, Emine, Sprueill, Henry W., Batatia, Ilyes, Xantheas, Sotiris S., Lee, MalSoon, Mundy, Chris, Csanyi, Gabor, Smith, Justin S., Sadayappan, Ponnuswamy, Choudhury, Sutanay
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.10700
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912328937635840
author Firoz, Jesun
Pellegrini, Franco
Geiger, Mario
Hsu, Darren
Bilbrey, Jenna A.
Chou, Han-Yi
Stadler, Maximilian
Hoehnerbach, Markus
Wang, Tingyu
Lin, Dejun
Kucukbenli, Emine
Sprueill, Henry W.
Batatia, Ilyes
Xantheas, Sotiris S.
Lee, MalSoon
Mundy, Chris
Csanyi, Gabor
Smith, Justin S.
Sadayappan, Ponnuswamy
Choudhury, Sutanay
author_facet Firoz, Jesun
Pellegrini, Franco
Geiger, Mario
Hsu, Darren
Bilbrey, Jenna A.
Chou, Han-Yi
Stadler, Maximilian
Hoehnerbach, Markus
Wang, Tingyu
Lin, Dejun
Kucukbenli, Emine
Sprueill, Henry W.
Batatia, Ilyes
Xantheas, Sotiris S.
Lee, MalSoon
Mundy, Chris
Csanyi, Gabor
Smith, Justin S.
Sadayappan, Ponnuswamy
Choudhury, Sutanay
contents Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.
format Preprint
id arxiv_https___arxiv_org_abs_2504_10700
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE
Firoz, Jesun
Pellegrini, Franco
Geiger, Mario
Hsu, Darren
Bilbrey, Jenna A.
Chou, Han-Yi
Stadler, Maximilian
Hoehnerbach, Markus
Wang, Tingyu
Lin, Dejun
Kucukbenli, Emine
Sprueill, Henry W.
Batatia, Ilyes
Xantheas, Sotiris S.
Lee, MalSoon
Mundy, Chris
Csanyi, Gabor
Smith, Justin S.
Sadayappan, Ponnuswamy
Choudhury, Sutanay
Distributed, Parallel, and Cluster Computing
Artificial Intelligence
Chemistry Foundation Models (CFMs) that leverage Graph Neural Networks (GNNs) operating on 3D molecular graph structures are becoming indispensable tools for computational chemists and materials scientists. These models facilitate the understanding of matter and the discovery of new molecules and materials. In contrast to GNNs operating on a large homogeneous graphs, GNNs used by CFMs process a large number of geometric graphs of varying sizes, requiring different optimization strategies than those developed for large homogeneous GNNs. This paper presents optimizations for two critical phases of CFM training: data distribution and model training, targeting MACE - a state-of-the-art CFM. We address the challenge of load balancing in data distribution by formulating it as a multi-objective bin packing problem. We propose an iterative algorithm that provides a highly effective, fast, and practical solution, ensuring efficient data distribution. For the training phase, we identify symmetric tensor contraction as the key computational kernel in MACE and optimize this kernel to improve the overall performance. Our combined approach of balanced data distribution and kernel optimization significantly enhances the training process of MACE. Experimental results demonstrate a substantial speedup, reducing per-epoch execution time for training from 12 to 2 minutes on 740 GPUs with a 2.6M sample dataset.
title Optimizing Data Distribution and Kernel Performance for Efficient Training of Chemistry Foundation Models: A Case Study with MACE
topic Distributed, Parallel, and Cluster Computing
Artificial Intelligence
url https://arxiv.org/abs/2504.10700