Saved in:
| Main Authors: | Gao, Xinyu, Chen, Gang, Alonso-Mora, Javier |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.09961 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions
by: Wu, Ruihai, et al.
Published: (2023)
by: Wu, Ruihai, et al.
Published: (2023)
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)
by: Yuan, Wentao, et al.
Published: (2024)
Language-Conditioned World Modeling for Visual Navigation
by: Dong, Yifei, et al.
Published: (2026)
by: Dong, Yifei, et al.
Published: (2026)
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)
by: Zhang, Pingrui, et al.
Published: (2025)
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
by: Chen, Hongjin, et al.
Published: (2026)
by: Chen, Hongjin, et al.
Published: (2026)
ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints
by: Chen, Pei-An, et al.
Published: (2026)
by: Chen, Pei-An, et al.
Published: (2026)
General Flow as Foundation Affordance for Scalable Robot Learning
by: Yuan, Chengbo, et al.
Published: (2024)
by: Yuan, Chengbo, et al.
Published: (2024)
RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment
by: Zhuang, Qiyuan, et al.
Published: (2026)
by: Zhuang, Qiyuan, et al.
Published: (2026)
Vision-Language Navigation with Embodied Intelligence: A Survey
by: Gao, Peng, et al.
Published: (2024)
by: Gao, Peng, et al.
Published: (2024)
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
by: Zhu, He, et al.
Published: (2025)
by: Zhu, He, et al.
Published: (2025)
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model
by: Wang, Junming, et al.
Published: (2024)
by: Wang, Junming, et al.
Published: (2024)
Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments
by: Lange, Bernard, et al.
Published: (2023)
by: Lange, Bernard, et al.
Published: (2023)
UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
by: Tang, Yihe, et al.
Published: (2025)
by: Tang, Yihe, et al.
Published: (2025)
SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
by: Dong, Xiangyu, et al.
Published: (2025)
by: Dong, Xiangyu, et al.
Published: (2025)
Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation
by: He, Yu, et al.
Published: (2025)
by: He, Yu, et al.
Published: (2025)
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
by: Wu, Ruihai, et al.
Published: (2025)
by: Wu, Ruihai, et al.
Published: (2025)
MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots
by: Kim, Junseo, et al.
Published: (2025)
by: Kim, Junseo, et al.
Published: (2025)
MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning
by: Chen, Guangli, et al.
Published: (2026)
by: Chen, Guangli, et al.
Published: (2026)
RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
by: Nasiriany, Soroush, et al.
Published: (2024)
by: Nasiriany, Soroush, et al.
Published: (2024)
OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)
by: Gao, Chen, et al.
Published: (2025)
MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)
by: Lian, Guoxin, et al.
Published: (2026)
DINO-CVA: A Multimodal Goal-Conditioned Vision-to-Action Model for Autonomous Catheter Navigation
by: Fekri, Pedram, et al.
Published: (2025)
by: Fekri, Pedram, et al.
Published: (2025)
TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
by: Li, Shanshan, et al.
Published: (2025)
by: Li, Shanshan, et al.
Published: (2025)
A Navigation Framework Utilizing Vision-Language Models
by: Duan, Yicheng, et al.
Published: (2025)
by: Duan, Yicheng, et al.
Published: (2025)
AgriVLN: Vision-and-Language Navigation for Agricultural Robots
by: Zhao, Xiaobei, et al.
Published: (2025)
by: Zhao, Xiaobei, et al.
Published: (2025)
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)
by: Zhang, Shiduo, et al.
Published: (2024)
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
by: Chen, Jiaqi, et al.
Published: (2024)
by: Chen, Jiaqi, et al.
Published: (2024)
AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction
by: Maksutova, Aiza, et al.
Published: (2026)
by: Maksutova, Aiza, et al.
Published: (2026)
LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)
by: Hao, Haihong, et al.
Published: (2026)
Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2024)
by: Mazzaglia, Pietro, et al.
Published: (2024)
What Limits Vision-and-Language Navigation ?
by: Wang, Yunheng, et al.
Published: (2026)
by: Wang, Yunheng, et al.
Published: (2026)
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
by: Zhong, Linqing, et al.
Published: (2024)
by: Zhong, Linqing, et al.
Published: (2024)
Online Language Splatting
by: Katragadda, Saimouli, et al.
Published: (2025)
by: Katragadda, Saimouli, et al.
Published: (2025)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation
by: Xu, Yunzhe, et al.
Published: (2025)
by: Xu, Yunzhe, et al.
Published: (2025)
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
by: Li, Heng, et al.
Published: (2024)
by: Li, Heng, et al.
Published: (2024)
Vision-and-Language Navigation Generative Pretrained Transformer
by: Hanlin, Wen
Published: (2024)
by: Hanlin, Wen
Published: (2024)
ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation
by: Zhang, Zekai, et al.
Published: (2025)
by: Zhang, Zekai, et al.
Published: (2025)
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
by: Zhang, Mike, et al.
Published: (2024)
by: Zhang, Mike, et al.
Published: (2024)
Human-like Navigation in a World Built for Humans
by: Chandaka, Bhargav, et al.
Published: (2025)
by: Chandaka, Bhargav, et al.
Published: (2025)
Similar Items
-
Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions
by: Wu, Ruihai, et al.
Published: (2023) -
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024) -
Language-Conditioned World Modeling for Visual Navigation
by: Dong, Yifei, et al.
Published: (2026) -
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025) -
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
by: Chen, Hongjin, et al.
Published: (2026)