Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.01473 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912686129807360 |
|---|---|
| author | Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario |
| author_facet | Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario |
| contents | State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_01473 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention Honarpisheh, Arya Bozdag, Mustafa Camps, Octavia Sznaier, Mario Machine Learning State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model's stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior. |
| title | Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2502.01473 |