Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Pan, Peng, Baochai, Lu, Chaoran, Huang, Quanjin
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Image and Video Processing Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2412.02044
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866929616630841344
author	Zhang, Pan Peng, Baochai Lu, Chaoran Huang, Quanjin
author_facet	Zhang, Pan Peng, Baochai Lu, Chaoran Huang, Quanjin
contents	Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256x256 pixels. The source code are available at https://github.com/whu-pzhang/ASANet
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_02044
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification Zhang, Pan Peng, Baochai Lu, Chaoran Huang, Quanjin Image and Video Processing Computer Vision and Pattern Recognition Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256x256 pixels. The source code are available at https://github.com/whu-pzhang/ASANet
title	ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
topic	Image and Video Processing Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2412.02044

Ähnliche Einträge