Guardado en:
Detalles Bibliográficos
Autores principales: Han, Dongchen, Ye, Tianzhu, Xia, Zhuofan, Chen, Kaiyi, Wang, Yulin, Chen, Hanting, Huang, Gao
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2511.14329
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911273975808000
author Han, Dongchen
Ye, Tianzhu
Xia, Zhuofan
Chen, Kaiyi
Wang, Yulin
Chen, Hanting
Huang, Gao
author_facet Han, Dongchen
Ye, Tianzhu
Xia, Zhuofan
Chen, Kaiyi
Wang, Yulin
Chen, Hanting
Huang, Gao
contents Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to more than one hundred layers and enjoy wide success. However, as networks continue to deepen, current architectures often struggle to realize their theoretical capacity improvements, calling for more advanced designs to further unleash the potential of deeper networks. In this paper, we identify two key barriers that obstruct residual models from scaling deeper: shortcut degradation and limited width. Shortcut degradation hinders deep-layer learning, while the inherent depth-width trade-off imposes limited width. To mitigate these issues, we propose a generalized residual architecture dubbed Step by Step Network (StepsNet) to bridge the gap between theoretical potential and practical performance of deep models. Specifically, we separate features along the channel dimension and let the model learn progressively via stacking blocks with increasing width. The resulting method mitigates the two identified problems and serves as a versatile macro design applicable to various models. Extensive experiments show that our method consistently outperforms residual models across diverse tasks, including image classification, object detection, semantic segmentation, and language modeling. These results position StepsNet as a superior generalization of the widely adopted residual architecture.
format Preprint
id arxiv_https___arxiv_org_abs_2511_14329
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Step by Step Network
Han, Dongchen
Ye, Tianzhu
Xia, Zhuofan
Chen, Kaiyi
Wang, Yulin
Chen, Hanting
Huang, Gao
Computer Vision and Pattern Recognition
Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to more than one hundred layers and enjoy wide success. However, as networks continue to deepen, current architectures often struggle to realize their theoretical capacity improvements, calling for more advanced designs to further unleash the potential of deeper networks. In this paper, we identify two key barriers that obstruct residual models from scaling deeper: shortcut degradation and limited width. Shortcut degradation hinders deep-layer learning, while the inherent depth-width trade-off imposes limited width. To mitigate these issues, we propose a generalized residual architecture dubbed Step by Step Network (StepsNet) to bridge the gap between theoretical potential and practical performance of deep models. Specifically, we separate features along the channel dimension and let the model learn progressively via stacking blocks with increasing width. The resulting method mitigates the two identified problems and serves as a versatile macro design applicable to various models. Extensive experiments show that our method consistently outperforms residual models across diverse tasks, including image classification, object detection, semantic segmentation, and language modeling. These results position StepsNet as a superior generalization of the widely adopted residual architecture.
title Step by Step Network
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2511.14329