Saved in:
Bibliographic Details
Main Authors: Liu, Sixian, Xu, Chen, Wang, Qiang, Shi, Donghai, Li, Yiwen
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.23151
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911234528378880
author Liu, Sixian
Xu, Chen
Wang, Qiang
Shi, Donghai
Li, Yiwen
author_facet Liu, Sixian
Xu, Chen
Wang, Qiang
Shi, Donghai
Li, Yiwen
contents Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on challenging excavator operation scenarios to benchmark performance in complex conditions. Our method not only achieves competitive performance on the standard KITTI dataset with 93.92% accuracy, but also significantly outperforms the baseline by 24.88% on the challenging E3D dataset, demonstrating superior robustness to unreliable modal information in complex industrial scenes.
format Preprint
id arxiv_https___arxiv_org_abs_2510_23151
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
Liu, Sixian
Xu, Chen
Wang, Qiang
Shi, Donghai
Li, Yiwen
Computer Vision and Pattern Recognition
Machine Learning
Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on challenging excavator operation scenarios to benchmark performance in complex conditions. Our method not only achieves competitive performance on the standard KITTI dataset with 93.92% accuracy, but also significantly outperforms the baseline by 24.88% on the challenging E3D dataset, demonstrating superior robustness to unreliable modal information in complex industrial scenes.
title AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2510.23151