Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Sixian, Xu, Chen, Wang, Qiang, Shi, Donghai, Li, Yiwen
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2510.23151
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911234528378880
author	Liu, Sixian Xu, Chen Wang, Qiang Shi, Donghai Li, Yiwen
author_facet	Liu, Sixian Xu, Chen Wang, Qiang Shi, Donghai Li, Yiwen
contents	Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on challenging excavator operation scenarios to benchmark performance in complex conditions. Our method not only achieves competitive performance on the standard KITTI dataset with 93.92% accuracy, but also significantly outperforms the baseline by 24.88% on the challenging E3D dataset, demonstrating superior robustness to unreliable modal information in complex industrial scenes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_23151
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes Liu, Sixian Xu, Chen Wang, Qiang Shi, Donghai Li, Yiwen Computer Vision and Pattern Recognition Machine Learning Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on challenging excavator operation scenarios to benchmark performance in complex conditions. Our method not only achieves competitive performance on the standard KITTI dataset with 93.92% accuracy, but also significantly outperforms the baseline by 24.88% on the challenging E3D dataset, demonstrating superior robustness to unreliable modal information in complex industrial scenes.
title	AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2510.23151

Similar Items