:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gao, Xinyu, Chen, Gang, Alonso-Mora, Javier
Format:	Preprint
Published:	2026
Subjects:	Robotics Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.09961
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions
by: Wu, Ruihai, et al.
Published: (2023)

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics
by: Yuan, Wentao, et al.
Published: (2024)

Language-Conditioned World Modeling for Visual Navigation
by: Dong, Yifei, et al.
Published: (2026)

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
by: Chen, Hongjin, et al.
Published: (2026)

ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints
by: Chen, Pei-An, et al.
Published: (2026)

General Flow as Foundation Affordance for Scalable Robot Learning
by: Yuan, Chengbo, et al.
Published: (2024)

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment
by: Zhuang, Qiyuan, et al.
Published: (2026)

Vision-Language Navigation with Embodied Intelligence: A Survey
by: Gao, Peng, et al.
Published: (2024)

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
by: Zhu, He, et al.
Published: (2025)

OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model
by: Wang, Junming, et al.
Published: (2024)

Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments
by: Lange, Bernard, et al.
Published: (2023)

UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation
by: Tang, Yihe, et al.
Published: (2025)

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models
by: Dong, Xiangyu, et al.
Published: (2025)

Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation
by: He, Yu, et al.
Published: (2025)

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
by: Wu, Ruihai, et al.
Published: (2025)

MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots
by: Kim, Junseo, et al.
Published: (2025)

MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning
by: Chen, Guangli, et al.
Published: (2026)

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
by: Nasiriany, Soroush, et al.
Published: (2024)

OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)

MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)

DINO-CVA: A Multimodal Goal-Conditioned Vision-to-Action Model for Autonomous Catheter Navigation
by: Fekri, Pedram, et al.
Published: (2025)

TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
by: Li, Shanshan, et al.
Published: (2025)

A Navigation Framework Utilizing Vision-Language Models
by: Duan, Yicheng, et al.
Published: (2025)

AgriVLN: Vision-and-Language Navigation for Agricultural Robots
by: Zhao, Xiaobei, et al.
Published: (2025)

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
by: Chen, Jiaqi, et al.
Published: (2024)

AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction
by: Maksutova, Aiza, et al.
Published: (2026)

LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)

Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2024)

What Limits Vision-and-Language Navigation ?
by: Wang, Yunheng, et al.
Published: (2026)

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
by: Zhong, Linqing, et al.
Published: (2024)

Online Language Splatting
by: Katragadda, Saimouli, et al.
Published: (2025)

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation
by: Xu, Yunzhe, et al.
Published: (2025)

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
by: Li, Heng, et al.
Published: (2024)

Vision-and-Language Navigation Generative Pretrained Transformer
by: Hanlin, Wen
Published: (2024)

ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation
by: Zhang, Zekai, et al.
Published: (2025)

Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
by: Zhang, Mike, et al.
Published: (2024)

Human-like Navigation in a World Built for Humans
by: Chandaka, Bhargav, et al.
Published: (2025)