Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.18178 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916179679903744 |
|---|---|
| author | Taguchi, Shun Deguchi, Hideki |
| author_facet | Taguchi, Shun Deguchi, Hideki |
| contents | This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously achieves real-time object search and mapping of unfamiliar environments. Additionally, we propose a zero-shot object-goal navigation system based on our mapping approach, and we validate its efficacy through object-goal navigation, offline object retrieval, and multi-object-goal navigation in both simulated environments and real robot experiments. The findings demonstrate that our method not only exhibits swifter performance than state-of-the-art mapping methods but also surpasses them in terms of the success rate of object-goal navigation tasks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2403_18178 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Online Embedding Multi-Scale CLIP Features into 3D Maps Taguchi, Shun Deguchi, Hideki Robotics Computer Vision and Pattern Recognition This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously achieves real-time object search and mapping of unfamiliar environments. Additionally, we propose a zero-shot object-goal navigation system based on our mapping approach, and we validate its efficacy through object-goal navigation, offline object retrieval, and multi-object-goal navigation in both simulated environments and real robot experiments. The findings demonstrate that our method not only exhibits swifter performance than state-of-the-art mapping methods but also surpasses them in terms of the success rate of object-goal navigation tasks. |
| title | Online Embedding Multi-Scale CLIP Features into 3D Maps |
| topic | Robotics Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2403.18178 |