Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Li, Siyuan, Wang, Zedong, Liu, Zicheng, Tan, Cheng, Lin, Haitao, Wu, Di, Chen, Zhiyuan, Zheng, Jiangbin, Li, Stan Z.
Formato:	Preprint
Publicado:	2022
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2211.03295
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912430333886464
author	Li, Siyuan Wang, Zedong Liu, Zicheng Tan, Cheng Lin, Haitao Wu, Di Chen, Zhiyuan Zheng, Jiangbin Li, Stan Z.
author_facet	Li, Siyuan Wang, Zedong Liu, Zicheng Tan, Cheng Lin, Haitao Wu, Di Chen, Zhiyuan Zheng, Jiangbin Li, Stan Z.
contents	By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.
format	Preprint
id	arxiv_https___arxiv_org_abs_2211_03295
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	MogaNet: Multi-order Gated Aggregation Network Li, Siyuan Wang, Zedong Liu, Zicheng Tan, Cheng Lin, Haitao Wu, Di Chen, Zhiyuan Zheng, Jiangbin Li, Stan Z. Computer Vision and Pattern Recognition Artificial Intelligence By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.
title	MogaNet: Multi-order Gated Aggregation Network
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2211.03295

Ejemplares similares