:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Byun, Ye Won, Jiao, Cathy, Noroozizadeh, Shahriar, Sun, Jimin, Vitiello, Rosa
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning Robotics
Online Access:	https://arxiv.org/abs/2406.17876
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects
by: Zhao, Chen, et al.
Published: (2024)

Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
by: Wei, Ziming, et al.
Published: (2025)

Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration
by: Liu, Jian, et al.
Published: (2025)

Dream-SLAM: Dreaming the Unseen for Active SLAM in Dynamic Environments
by: Meng, Xiangqi, et al.
Published: (2026)

FoundPose: Unseen Object Pose Estimation with Foundation Features
by: Örnek, Evin Pınar, et al.
Published: (2023)

Adapting Segment Anything Model for Unseen Object Instance Segmentation
by: Cao, Rui, et al.
Published: (2024)

NextBestPath: Efficient 3D Mapping of Unseen Environments
by: Li, Shiyao, et al.
Published: (2025)

Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments
by: Hong, Haodong, et al.
Published: (2024)

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments
by: An, Dong, et al.
Published: (2023)

UAS Visual Navigation in Large and Unseen Environments via a Meta Agent
by: Han, Yuci, et al.
Published: (2025)

LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation
by: Zhao, Chen, et al.
Published: (2022)

Enhancing Underwater Object Detection through Spatio-Temporal Analysis and Spatial Attention Networks
by: Karri, Sai Likhith, et al.
Published: (2025)

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)

ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects
by: Murali, Prajval Kumar, et al.
Published: (2025)

BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects
by: Hodan, Tomas, et al.
Published: (2024)

Holodeck: Language Guided Generation of 3D Embodied AI Environments
by: Yang, Yue, et al.
Published: (2023)

Segmenting Known Objects and Unseen Unknowns without Prior Knowledge
by: Gasperini, Stefano, et al.
Published: (2022)

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments
by: Xu, Yunzhe, et al.
Published: (2024)

Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025)

AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving
by: Qian, Kangan, et al.
Published: (2025)

CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations
by: Wu, Pengying, et al.
Published: (2024)

CLIP-Loc: Multi-modal Landmark Association for Global Localization in Object-based Maps
by: Matsuzaki, Shigemichi, et al.
Published: (2024)

GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024
by: Liu, Xingyu, et al.
Published: (2024)

InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment
by: Long, Yuxing, et al.
Published: (2024)

Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
by: Mitra, Chancharik, et al.
Published: (2023)

VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions
by: Su, Hung-Ting, et al.
Published: (2026)

Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation
by: Hoftijzer, Dennis, et al.
Published: (2024)

The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning
by: Jiang, Titong, et al.
Published: (2025)

Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
by: Qiu, Dicong, et al.
Published: (2024)

UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking
by: Lee, Chang Won, et al.
Published: (2024)

MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)

Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
by: Nguyen, Nghia, et al.
Published: (2024)

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
by: Lu, Jinghui, et al.
Published: (2026)

Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)

CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization
by: Matsuzaki, Shigemichi, et al.
Published: (2024)

Online Mapping for Autonomous Driving: Addressing Sensor Generalization and Dynamic Map Updates in Campus Environments
by: Zhang, Zihan, et al.
Published: (2025)

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
by: Hong, Yining, et al.
Published: (2026)

A Language Agent for Autonomous Driving
by: Mao, Jiageng, et al.
Published: (2023)

An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects
by: Rai, Utsav, et al.
Published: (2025)