Saved in:
Bibliographic Details
Main Authors: Honarpisheh, Arya, Bozdag, Mustafa, Camps, Octavia, Sznaier, Mario
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.01473
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912686129807360
author Honarpisheh, Arya
Bozdag, Mustafa
Camps, Octavia
Sznaier, Mario
author_facet Honarpisheh, Arya
Bozdag, Mustafa
Camps, Octavia
Sznaier, Mario
contents State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
format Preprint
id arxiv_https___arxiv_org_abs_2502_01473
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
Honarpisheh, Arya
Bozdag, Mustafa
Camps, Octavia
Sznaier, Mario
Machine Learning
State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
title Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
topic Machine Learning
url https://arxiv.org/abs/2502.01473