Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Junpeng, Yeh, Chin-Chia Michael, Saini, Uday Singh, Das, Mahashweta
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.20764
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915176572256256
author	Wang, Junpeng Yeh, Chin-Chia Michael Saini, Uday Singh Das, Mahashweta
author_facet	Wang, Junpeng Yeh, Chin-Chia Michael Saini, Uday Singh Das, Mahashweta
contents	State space models (SSMs) have emerged as an efficient alternative to transformer-based models, offering linear complexity that scales better than transformers. One of the latest advances in SSMs, Mamba, introduces a selective scan mechanism that assigns trainable weights to input tokens, effectively mimicking the attention mechanism. Mamba has also been successfully extended to the vision domain by decomposing 2D images into smaller patches and arranging them as 1D sequences. However, it remains unclear how these patches interact with (or attend to) each other in relation to their original 2D spatial location. Additionally, the order used to arrange the patches into a sequence also significantly impacts their attention distribution. To better understand the attention between patches and explore the attention patterns, we introduce a visual analytics tool specifically designed for vision-based Mamba models. This tool enables a deeper understanding of how attention is distributed across patches in different Mamba blocks and how it evolves throughout a Mamba model. Using the tool, we also investigate the impact of different patch-ordering strategies on the learned attention, offering further insights into the model's behavior.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_20764
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Visual Attention Exploration in Vision-Based Mamba Models Wang, Junpeng Yeh, Chin-Chia Michael Saini, Uday Singh Das, Mahashweta Machine Learning State space models (SSMs) have emerged as an efficient alternative to transformer-based models, offering linear complexity that scales better than transformers. One of the latest advances in SSMs, Mamba, introduces a selective scan mechanism that assigns trainable weights to input tokens, effectively mimicking the attention mechanism. Mamba has also been successfully extended to the vision domain by decomposing 2D images into smaller patches and arranging them as 1D sequences. However, it remains unclear how these patches interact with (or attend to) each other in relation to their original 2D spatial location. Additionally, the order used to arrange the patches into a sequence also significantly impacts their attention distribution. To better understand the attention between patches and explore the attention patterns, we introduce a visual analytics tool specifically designed for vision-based Mamba models. This tool enables a deeper understanding of how attention is distributed across patches in different Mamba blocks and how it evolves throughout a Mamba model. Using the tool, we also investigate the impact of different patch-ordering strategies on the learned attention, offering further insights into the model's behavior.
title	Visual Attention Exploration in Vision-Based Mamba Models
topic	Machine Learning
url	https://arxiv.org/abs/2502.20764

Similar Items