:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Sihao, Li, Zerui, Zhao, Xunyi, Zhou, Gengze, Wang, Liuyi, Wei, Rong, Tang, Rui, Li, Juncheng, Wang, Hanqing, Pang, Jiangmiao, Hengel, Anton van den, Liu, Jiajun, Wu, Qi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2512.19021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
by: Wang, Liuyi, et al.
Published: (2025)

VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents
by: Zhao, Xunyi, et al.
Published: (2025)

Decoupled Action Expert: Confining Task Knowledge to the Conditioning Pathway
by: Zhou, Jian, et al.
Published: (2025)

Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025)

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
by: Li, Rui, et al.
Published: (2025)

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
by: Wei, Meng, et al.
Published: (2024)

Embodied Navigation Foundation Model
by: Zhang, Jiazhao, et al.
Published: (2025)

Language-to-Space Programming for Training-Free 3D Visual Grounding
by: Mi, Boyu, et al.
Published: (2025)

Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment
by: Yan, Ningyu, et al.
Published: (2026)

Towards Physically Executable 3D Gaussian for Embodied Navigation
by: Miao, Bingchen, et al.
Published: (2025)

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
by: Wang, Xiangyu, et al.
Published: (2024)

EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)

One Agent to Guide Them All: Empowering MLLMs for Vision-and-Language Navigation via Explicit World Representation
by: Li, Zerui, et al.
Published: (2026)

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
by: Wei, Meng, et al.
Published: (2025)

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
by: Zhong, Weipeng, et al.
Published: (2025)

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)

NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
by: Cai, Wenzhe, et al.
Published: (2025)

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
by: Huang, Wensi, et al.
Published: (2025)

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
by: Gao, Ning, et al.
Published: (2025)

TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation
by: Li, Hangyu, et al.
Published: (2025)

Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers
by: Shinnick, Zachary, et al.
Published: (2025)

Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale
by: Li, Songze, et al.
Published: (2025)

BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
by: Lyu, Wenqi, et al.
Published: (2025)

RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation
by: Antonov, Anton, et al.
Published: (2024)

NavTrust: Benchmarking Trustworthiness for Embodied Navigation
by: Jiang, Huaide, et al.
Published: (2026)

MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation
by: He, Zongtao, et al.
Published: (2023)

Vision-and-Language Navigation via Causal Learning
by: Wang, Liuyi, et al.
Published: (2024)

A Unified and General Humanoid Whole-Body Controller for Versatile Locomotion
by: Xue, Yufei, et al.
Published: (2025)

Hierarchical Process Reward Models are Symbolic Vision Learners
by: Zhang, Shan, et al.
Published: (2025)

CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
by: Li, Hao, et al.
Published: (2025)

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation
by: Li, Chengshu, et al.
Published: (2024)

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
by: Li, Manling, et al.
Published: (2024)

Pediatric Chronic Monteggia Fractures: Insights From a Comprehensive Review
by: Gengze Li, et al.
Published: (2025)

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
by: Long, Junfeng, et al.
Published: (2023)

LangMap: A Human-Verified Benchmark for Hierarchical Open-Vocabulary Goal Navigation
by: Miao, Bo, et al.
Published: (2026)

Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)

Confident Sinkhorn Allocation for Pseudo-Labeling
by: Nguyen, Vu, et al.
Published: (2022)

Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors
by: Xia, Jiatong, et al.
Published: (2026)