Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Jingyi, Ma, Junyi, Wu, Qi, Zhou, Zijie, Wang, Yue, Chen, Xieyuanli, Pei, Ling
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2402.17264
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910345534111744
author	Xu, Jingyi Ma, Junyi Wu, Qi Zhou, Zijie Wang, Yue Chen, Xieyuanli Pei, Ling
author_facet	Xu, Jingyi Ma, Junyi Wu, Qi Zhou, Zijie Wang, Yue Chen, Xieyuanli Pei, Ling
contents	Fusion-based place recognition is an emerging technique jointly utilizing multi-modal perception data, to recognize previously visited places in GPS-denied scenarios for robots and autonomous vehicles. Recent fusion-based place recognition methods combine multi-modal features in implicit manners. While achieving remarkable results, they do not explicitly consider what the individual modality affords in the fusion system. Therefore, the benefit of multi-modal feature fusion may not be fully explored. In this paper, we propose a novel fusion-based network, dubbed EINet, to achieve explicit interaction of the two modalities. EINet uses LiDAR ranges to supervise more robust vision features for long time spans, and simultaneously uses camera RGB data to improve the discrimination of LiDAR point clouds. In addition, we develop a new benchmark for the place recognition task based on the nuScenes dataset. To establish this benchmark for future research with comprehensive comparisons, we introduce both supervised and self-supervised training schemes alongside evaluation protocols. We conduct extensive experiments on the proposed benchmark, and the experimental results show that our EINet exhibits better recognition performance as well as solid generalization ability compared to the state-of-the-art fusion-based place recognition approaches. Our open-source code and benchmark are released at: https://github.com/BIT-XJY/EINet.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_17264
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Explicit Interaction for Fusion-Based Place Recognition Xu, Jingyi Ma, Junyi Wu, Qi Zhou, Zijie Wang, Yue Chen, Xieyuanli Pei, Ling Computer Vision and Pattern Recognition Robotics Fusion-based place recognition is an emerging technique jointly utilizing multi-modal perception data, to recognize previously visited places in GPS-denied scenarios for robots and autonomous vehicles. Recent fusion-based place recognition methods combine multi-modal features in implicit manners. While achieving remarkable results, they do not explicitly consider what the individual modality affords in the fusion system. Therefore, the benefit of multi-modal feature fusion may not be fully explored. In this paper, we propose a novel fusion-based network, dubbed EINet, to achieve explicit interaction of the two modalities. EINet uses LiDAR ranges to supervise more robust vision features for long time spans, and simultaneously uses camera RGB data to improve the discrimination of LiDAR point clouds. In addition, we develop a new benchmark for the place recognition task based on the nuScenes dataset. To establish this benchmark for future research with comprehensive comparisons, we introduce both supervised and self-supervised training schemes alongside evaluation protocols. We conduct extensive experiments on the proposed benchmark, and the experimental results show that our EINet exhibits better recognition performance as well as solid generalization ability compared to the state-of-the-art fusion-based place recognition approaches. Our open-source code and benchmark are released at: https://github.com/BIT-XJY/EINet.
title	Explicit Interaction for Fusion-Based Place Recognition
topic	Computer Vision and Pattern Recognition Robotics
url	https://arxiv.org/abs/2402.17264

Similar Items