Saved in:
Bibliographic Details
Main Authors: Liu, Fusheng, Li, Qianxiao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.02670
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914784037830656
author Liu, Fusheng
Li, Qianxiao
author_facet Liu, Fusheng
Li, Qianxiao
contents A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
format Preprint
id arxiv_https___arxiv_org_abs_2405_02670
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle From Generalization Analysis to Optimization Designs for State Space Models
Liu, Fusheng
Li, Qianxiao
Machine Learning
A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
title From Generalization Analysis to Optimization Designs for State Space Models
topic Machine Learning
url https://arxiv.org/abs/2405.02670