Saved in:
Bibliographic Details
Main Authors: Latif, Imran, Newkirk, Alex C., Carbone, Matthew R., Munir, Arslan, Lin, Yuewei, Koomey, Jonathan, Yu, Xi, Dong, Zhiuha
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.08602
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The expansion of artificial intelligence (AI) applications has driven substantial investment in computational infrastructure, especially by cloud computing providers. Quantifying the energy footprint of this infrastructure requires models parameterized by the power demand of AI hardware during training. We empirically measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node during the training of open-source image classifier (ResNet) and large-language models (Llama2-13b). The maximum observed power draw was approximately 8.4 kW, 18% lower than the manufacturer-rated 10.2 kW, even with GPUs near full utilization. Holding model architecture constant, increasing batch size from 512 to 4096 images for ResNet reduced total training energy consumption by a factor of 4. These findings can inform capacity planning for data center operators and energy use estimates by researchers. Future work will investigate the impact of cooling technology and carbon-aware scheduling on AI workload energy consumption.