Saved in:
Bibliographic Details
Main Authors: Garcia, Cyril, Remy, Guillaume
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.09869
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915788584124416
author Garcia, Cyril
Remy, Guillaume
author_facet Garcia, Cyril
Remy, Guillaume
contents We study the performance of transformer architectures for multivariate time-series forecasting in low-data regimes consisting of only a few years of daily observations. Using synthetically generated processes with known temporal and cross-sectional dependency structures and varying signal-to-noise ratios, we conduct bootstrapped experiments that enable direct evaluation via out-of-sample correlations with the optimal ground-truth predictor. We show that two-way attention transformers, which alternate between temporal and cross-sectional self-attention, can outperform standard baselines-Lasso, boosting methods, and fully connected multilayer perceptrons-across a wide range of settings, including low signal-to-noise regimes. We further introduce a dynamic sparsification procedure for attention matrices applied during training, and demonstrate that it becomes significantly effective in noisy environments, where the correlation between the target variable and the optimal predictor is on the order of a few percent. Analysis of the learned attention patterns reveals interpretable structure and suggests connections to sparsity-inducing regularization in classical regression, providing insight into why these models generalize effectively under noise.
format Preprint
id arxiv_https___arxiv_org_abs_2602_09869
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Statistical benchmarking of transformer models in low signal-to-noise time-series forecasting
Garcia, Cyril
Remy, Guillaume
Machine Learning
We study the performance of transformer architectures for multivariate time-series forecasting in low-data regimes consisting of only a few years of daily observations. Using synthetically generated processes with known temporal and cross-sectional dependency structures and varying signal-to-noise ratios, we conduct bootstrapped experiments that enable direct evaluation via out-of-sample correlations with the optimal ground-truth predictor. We show that two-way attention transformers, which alternate between temporal and cross-sectional self-attention, can outperform standard baselines-Lasso, boosting methods, and fully connected multilayer perceptrons-across a wide range of settings, including low signal-to-noise regimes. We further introduce a dynamic sparsification procedure for attention matrices applied during training, and demonstrate that it becomes significantly effective in noisy environments, where the correlation between the target variable and the optimal predictor is on the order of a few percent. Analysis of the learned attention patterns reveals interpretable structure and suggests connections to sparsity-inducing regularization in classical regression, providing insight into why these models generalize effectively under noise.
title Statistical benchmarking of transformer models in low signal-to-noise time-series forecasting
topic Machine Learning
url https://arxiv.org/abs/2602.09869