Saved in:
Bibliographic Details
Main Authors: Singh, Pradeep, Sharma, Mehak, Dey, Anupriya, Raman, Balasubramanian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.18130
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.