Saved in:
Bibliographic Details
Main Authors: Rauba, Paulius, Cikojevic, Viktor, Bartolic, Fran, Levang, Sam, Dickinson, Ty, Dwelle, Chase
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.03753
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915712882180096
author Rauba, Paulius
Cikojevic, Viktor
Bartolic, Fran
Levang, Sam
Dickinson, Ty
Dwelle, Chase
author_facet Rauba, Paulius
Cikojevic, Viktor
Bartolic, Fran
Levang, Sam
Dickinson, Ty
Dwelle, Chase
contents Weather forecasts sit upstream of high-stakes decisions in domains such as grid operations, aviation, agriculture, and emergency response. Yet forecast users often face a difficult trade-off. Many decision-relevant targets are functionals of the atmospheric state variables, such as extrema, accumulations, and threshold exceedances, rather than state variables themselves. As a result, users must estimate these targets via post-processing, which can be suboptimal and can introduce structural bias. The core issue is that decisions depend on distributions over these functionals that the model is not trained to learn directly. In this work, we introduce GEM-2, a probabilistic transformer that jointly learns global atmospheric dynamics alongside a suite of variables that users directly act upon. Using this training recipe, we show that a lightweight (~275M params) and computationally efficient (~20-100x training speedup relative to state-of-the-art) transformer trained on the CRPS objective can directly outperform operational numerical weather prediction (NWP) models and be competitive with ML models that rely on expensive multi-step diffusion processes or require bespoke multi-stage fine-tuning strategies. We further demonstrate state-of-the-art economic value metrics under decision-theoretic evaluation, stable convergence to climatology at S2S and seasonal timescales, and a surprising insensitivity to many commonly assumed architectural and training design choices.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03753
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables
Rauba, Paulius
Cikojevic, Viktor
Bartolic, Fran
Levang, Sam
Dickinson, Ty
Dwelle, Chase
Machine Learning
Weather forecasts sit upstream of high-stakes decisions in domains such as grid operations, aviation, agriculture, and emergency response. Yet forecast users often face a difficult trade-off. Many decision-relevant targets are functionals of the atmospheric state variables, such as extrema, accumulations, and threshold exceedances, rather than state variables themselves. As a result, users must estimate these targets via post-processing, which can be suboptimal and can introduce structural bias. The core issue is that decisions depend on distributions over these functionals that the model is not trained to learn directly. In this work, we introduce GEM-2, a probabilistic transformer that jointly learns global atmospheric dynamics alongside a suite of variables that users directly act upon. Using this training recipe, we show that a lightweight (~275M params) and computationally efficient (~20-100x training speedup relative to state-of-the-art) transformer trained on the CRPS objective can directly outperform operational numerical weather prediction (NWP) models and be competitive with ML models that rely on expensive multi-step diffusion processes or require bespoke multi-stage fine-tuning strategies. We further demonstrate state-of-the-art economic value metrics under decision-theoretic evaluation, stable convergence to climatology at S2S and seasonal timescales, and a surprising insensitivity to many commonly assumed architectural and training design choices.
title Probabilistic Transformers for Joint Modeling of Global Weather Dynamics and Decision-Centric Variables
topic Machine Learning
url https://arxiv.org/abs/2601.03753