Saved in:
Bibliographic Details
Main Authors: Wagh, Bhagyashree, Singh, Akash
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.00253
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909007155822592
author Wagh, Bhagyashree
Singh, Akash
author_facet Wagh, Bhagyashree
Singh, Akash
contents Mamba's recurrent state h_t is, by construction, a compressed summary of every token seen so far. This raises a tempting hypothesis: if we extract token-level outputs y_t at fixed patch boundaries, we obtain semantic sentence summaries for free, with no pooling head, no fine-tuning, and no [CLS] token. We test this hypothesis carefully. Across five benchmarks (SST-2, CoLA, MRPC, STS-B, IMDb), we compare four strategies for extracting frozen sentence representations from a pretrained Mamba-130M backbone under a strict frozen-feature probing protocol, using three random seeds where computationally feasible. The results do not support the hypothesis: patch boundary readouts do not consistently outperform simple mean pooling. We identify and quantify two structural pathologies: severe anisotropy (mean pairwise cosine similarity 0.9999, std 0.000044) and representational collapse in the raw final SSM state (MCC = 0.000 on CoLA across all three seeds, confirmed via confusion matrix). We further propose orthogonal injection, a modified recurrence that constrains new information per
format Preprint
id arxiv_https___arxiv_org_abs_2605_00253
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Lost in State Space: Probing Frozen Mamba Representations
Wagh, Bhagyashree
Singh, Akash
Computation and Language
Machine Learning
Mamba's recurrent state h_t is, by construction, a compressed summary of every token seen so far. This raises a tempting hypothesis: if we extract token-level outputs y_t at fixed patch boundaries, we obtain semantic sentence summaries for free, with no pooling head, no fine-tuning, and no [CLS] token. We test this hypothesis carefully. Across five benchmarks (SST-2, CoLA, MRPC, STS-B, IMDb), we compare four strategies for extracting frozen sentence representations from a pretrained Mamba-130M backbone under a strict frozen-feature probing protocol, using three random seeds where computationally feasible. The results do not support the hypothesis: patch boundary readouts do not consistently outperform simple mean pooling. We identify and quantify two structural pathologies: severe anisotropy (mean pairwise cosine similarity 0.9999, std 0.000044) and representational collapse in the raw final SSM state (MCC = 0.000 on CoLA across all three seeds, confirmed via confusion matrix). We further propose orthogonal injection, a modified recurrence that constrains new information per
title Lost in State Space: Probing Frozen Mamba Representations
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2605.00253