Enregistré dans:
Détails bibliographiques
Auteurs principaux: Liu, Qinshuo, Zhao, Weiqin, Huang, Wei, Fang, Yanwen, Yu, Lequan, Li, Guodong
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2502.10463
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866916614636568576
author Liu, Qinshuo
Zhao, Weiqin
Huang, Wei
Fang, Yanwen
Yu, Lequan
Li, Guodong
author_facet Liu, Qinshuo
Zhao, Weiqin
Huang, Wei
Fang, Yanwen
Yu, Lequan
Li, Guodong
contents The depth of neural networks is a critical factor for their capability, with deeper models often demonstrating superior performance. Motivated by this, significant efforts have been made to enhance layer aggregation - reusing information from previous layers to better extract features at the current layer, to improve the representational power of deep neural networks. However, previous works have primarily addressed this problem from a discrete-state perspective which is not suitable as the number of network layers grows. This paper novelly treats the outputs from layers as states of a continuous process and considers leveraging the state space model (SSM) to design the aggregation of layers in very deep neural networks. Moreover, inspired by its advancements in modeling long sequences, the Selective State Space Models (S6) is employed to design a new module called Selective State Space Model Layer Aggregation (S6LA). This module aims to combine traditional CNN or transformer architectures within a sequential framework, enhancing the representational capabilities of state-of-the-art vision networks. Extensive experiments show that S6LA delivers substantial improvements in both image classification and detection tasks, highlighting the potential of integrating SSMs with contemporary deep learning techniques.
format Preprint
id arxiv_https___arxiv_org_abs_2502_10463
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
Liu, Qinshuo
Zhao, Weiqin
Huang, Wei
Fang, Yanwen
Yu, Lequan
Li, Guodong
Machine Learning
Artificial Intelligence
Networking and Internet Architecture
The depth of neural networks is a critical factor for their capability, with deeper models often demonstrating superior performance. Motivated by this, significant efforts have been made to enhance layer aggregation - reusing information from previous layers to better extract features at the current layer, to improve the representational power of deep neural networks. However, previous works have primarily addressed this problem from a discrete-state perspective which is not suitable as the number of network layers grows. This paper novelly treats the outputs from layers as states of a continuous process and considers leveraging the state space model (SSM) to design the aggregation of layers in very deep neural networks. Moreover, inspired by its advancements in modeling long sequences, the Selective State Space Models (S6) is employed to design a new module called Selective State Space Model Layer Aggregation (S6LA). This module aims to combine traditional CNN or transformer architectures within a sequential framework, enhancing the representational capabilities of state-of-the-art vision networks. Extensive experiments show that S6LA delivers substantial improvements in both image classification and detection tasks, highlighting the potential of integrating SSMs with contemporary deep learning techniques.
title From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
topic Machine Learning
Artificial Intelligence
Networking and Internet Architecture
url https://arxiv.org/abs/2502.10463