Saved in:
Bibliographic Details
Main Authors: Xu, Ming, Xie, Zilong
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.11541
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913558663528448
author Xu, Ming
Xie, Zilong
author_facet Xu, Ming
Xie, Zilong
contents Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at https://github.com/iCityLab/HSPR
format Preprint
id arxiv_https___arxiv_org_abs_2403_11541
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
Xu, Ming
Xie, Zilong
Computer Vision and Pattern Recognition
Most Vision-and-Language Navigation (VLN) algorithms are prone to making inaccurate decisions due to their lack of visual common sense and limited reasoning capabilities. To address this issue, we propose a Hierarchical Spatial Proximity Reasoning (HSPR) method. First, we introduce a scene understanding auxiliary task to help the agent build a knowledge base of hierarchical spatial proximity. This task utilizes panoramic views and object features to identify types of nodes and uncover the adjacency relationships between nodes, objects, and between nodes and objects. Second, we propose a multi-step reasoning navigation algorithm based on the hierarchical spatial proximity knowledge base, which continuously plans feasible paths to enhance exploration efficiency. Third, we introduce a residual fusion method to improve navigation decision accuracy. Finally, we validate our approach with experiments on publicly available datasets including REVERIE, SOON, R2R, and R4R. Our code is available at https://github.com/iCityLab/HSPR
title Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2403.11541