Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Bonetto, Davi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Cryptography and Security
Online Access:	https://arxiv.org/abs/2603.12414
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910051447341056
author	Bonetto, Davi
author_facet	Bonetto, Davi
contents	State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the strongest adaptive setting, with sub-15ms per-token latency. Causal interventions and cross-architecture transfer to hybrid SSM-Attention systems confirm that spectral monitoring provides a principled, deployable safety layer for recurrent foundation models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_12414
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SpectralGuard: Detecting Memory Collapse Attacks in State Space Models Bonetto, Davi Machine Learning Cryptography and Security State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the strongest adaptive setting, with sub-15ms per-token latency. Causal interventions and cross-architecture transfer to hybrid SSM-Attention systems confirm that spectral monitoring provides a principled, deployable safety layer for recurrent foundation models.
title	SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
topic	Machine Learning Cryptography and Security
url	https://arxiv.org/abs/2603.12414

Similar Items