Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Honarpisheh, Arya, Bozdag, Mustafa, Camps, Octavia, Sznaier, Mario
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.01473
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912686129807360
author	Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario
author_facet	Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario
contents	State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_01473
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario Machine Learning State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
title	Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention
topic	Machine Learning
url	https://arxiv.org/abs/2502.01473

Similar Items