Saved in:
Bibliographic Details
Main Authors: Zou, Yuchen, Chen, Yineng, Li, Zuchao, Zhang, Lefei, Zhao, Hai
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.16722
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910499809001472
author Zou, Yuchen
Chen, Yineng
Li, Zuchao
Zhang, Lefei
Zhao, Hai
author_facet Zou, Yuchen
Chen, Yineng
Li, Zuchao
Zhang, Lefei
Zhao, Hai
contents Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.
format Preprint
id arxiv_https___arxiv_org_abs_2406_16722
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
Zou, Yuchen
Chen, Yineng
Li, Zuchao
Zhang, Lefei
Zhao, Hai
Computation and Language
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.
title Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
topic Computation and Language
url https://arxiv.org/abs/2406.16722