Saved in:
| Main Authors: | Lin, Sihao, Li, Zerui, Zhao, Xunyi, Zhou, Gengze, Wang, Liuyi, Wei, Rong, Tang, Rui, Li, Juncheng, Wang, Hanqing, Pang, Jiangmiao, Hengel, Anton van den, Liu, Jiajun, Wu, Qi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.19021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
by: Wang, Liuyi, et al.
Published: (2025)
by: Wang, Liuyi, et al.
Published: (2025)
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents
by: Zhao, Xunyi, et al.
Published: (2025)
by: Zhao, Xunyi, et al.
Published: (2025)
Decoupled Action Expert: Confining Task Knowledge to the Conditioning Pathway
by: Zhou, Jian, et al.
Published: (2025)
by: Zhou, Jian, et al.
Published: (2025)
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025)
by: Li, Zerui, et al.
Published: (2025)
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
OVExp: Open Vocabulary Exploration for Object-Oriented Navigation
by: Wei, Meng, et al.
Published: (2024)
by: Wei, Meng, et al.
Published: (2024)
Embodied Navigation Foundation Model
by: Zhang, Jiazhao, et al.
Published: (2025)
by: Zhang, Jiazhao, et al.
Published: (2025)
Language-to-Space Programming for Training-Free 3D Visual Grounding
by: Mi, Boyu, et al.
Published: (2025)
by: Mi, Boyu, et al.
Published: (2025)
Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment
by: Yan, Ningyu, et al.
Published: (2026)
by: Yan, Ningyu, et al.
Published: (2026)
Towards Physically Executable 3D Gaussian for Embodied Navigation
by: Miao, Bingchen, et al.
Published: (2025)
by: Miao, Bingchen, et al.
Published: (2025)
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
by: Wang, Xiangyu, et al.
Published: (2024)
by: Wang, Xiangyu, et al.
Published: (2024)
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)
by: Hao, Jinkun, et al.
Published: (2026)
One Agent to Guide Them All: Empowering MLLMs for Vision-and-Language Navigation via Explicit World Representation
by: Li, Zerui, et al.
Published: (2026)
by: Li, Zerui, et al.
Published: (2026)
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
by: Wei, Meng, et al.
Published: (2025)
by: Wei, Meng, et al.
Published: (2025)
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
by: Zhong, Weipeng, et al.
Published: (2025)
by: Zhong, Weipeng, et al.
Published: (2025)
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)
by: Zhou, Gengze, et al.
Published: (2024)
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
by: Cai, Wenzhe, et al.
Published: (2025)
by: Cai, Wenzhe, et al.
Published: (2025)
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
by: Huang, Wensi, et al.
Published: (2025)
by: Huang, Wensi, et al.
Published: (2025)
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
by: Gao, Ning, et al.
Published: (2025)
by: Gao, Ning, et al.
Published: (2025)
TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation
by: Li, Hangyu, et al.
Published: (2025)
by: Li, Hangyu, et al.
Published: (2025)
Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers
by: Shinnick, Zachary, et al.
Published: (2025)
by: Shinnick, Zachary, et al.
Published: (2025)
Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale
by: Li, Songze, et al.
Published: (2025)
by: Li, Songze, et al.
Published: (2025)
BadNAVer: Exploring Jailbreak Attacks On Vision-and-Language Navigation
by: Lyu, Wenqi, et al.
Published: (2025)
by: Lyu, Wenqi, et al.
Published: (2025)
RClicks: Realistic Click Simulation for Benchmarking Interactive Segmentation
by: Antonov, Anton, et al.
Published: (2024)
by: Antonov, Anton, et al.
Published: (2024)
NavTrust: Benchmarking Trustworthiness for Embodied Navigation
by: Jiang, Huaide, et al.
Published: (2026)
by: Jiang, Huaide, et al.
Published: (2026)
MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation
by: He, Zongtao, et al.
Published: (2023)
by: He, Zongtao, et al.
Published: (2023)
Vision-and-Language Navigation via Causal Learning
by: Wang, Liuyi, et al.
Published: (2024)
by: Wang, Liuyi, et al.
Published: (2024)
A Unified and General Humanoid Whole-Body Controller for Versatile Locomotion
by: Xue, Yufei, et al.
Published: (2025)
by: Xue, Yufei, et al.
Published: (2025)
Hierarchical Process Reward Models are Symbolic Vision Learners
by: Zhang, Shan, et al.
Published: (2025)
by: Zhang, Shan, et al.
Published: (2025)
CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
by: Li, Hao, et al.
Published: (2025)
by: Li, Hao, et al.
Published: (2025)
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation
by: Li, Chengshu, et al.
Published: (2024)
by: Li, Chengshu, et al.
Published: (2024)
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)
by: Wang, Liuyi, et al.
Published: (2023)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
by: Li, Manling, et al.
Published: (2024)
by: Li, Manling, et al.
Published: (2024)
Pediatric Chronic Monteggia Fractures: Insights From a Comprehensive Review
by: Gengze Li, et al.
Published: (2025)
by: Gengze Li, et al.
Published: (2025)
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
by: Long, Junfeng, et al.
Published: (2023)
by: Long, Junfeng, et al.
Published: (2023)
LangMap: A Human-Verified Benchmark for Hierarchical Open-Vocabulary Goal Navigation
by: Miao, Bo, et al.
Published: (2026)
by: Miao, Bo, et al.
Published: (2026)
Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)
by: Shi, Xiangyu, et al.
Published: (2025)
Confident Sinkhorn Allocation for Pseudo-Labeling
by: Nguyen, Vu, et al.
Published: (2022)
by: Nguyen, Vu, et al.
Published: (2022)
Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors
by: Xia, Jiatong, et al.
Published: (2026)
by: Xia, Jiatong, et al.
Published: (2026)
Similar Items
-
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
by: Wang, Liuyi, et al.
Published: (2025) -
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents
by: Zhao, Xunyi, et al.
Published: (2025) -
Decoupled Action Expert: Confining Task Knowledge to the Conditioning Pathway
by: Zhou, Jian, et al.
Published: (2025) -
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025) -
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
by: Li, Rui, et al.
Published: (2025)