:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yashima, Daichi, Korekata, Ryosuke, Sugiura, Komei
Format:	Preprint
Published:	2024
Subjects:	Robotics Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.16576
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
by: Amemiya, Kanon, et al.
Published: (2026)

Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)

Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025)

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)

MLLM-as-a-Judge Exhibits Model Preference Bias
by: Koyama, Shuitsu, et al.
Published: (2026)

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
by: Yashima, Daichi, et al.
Published: (2026)

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)

ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
by: Yashima, Daichi, et al.
Published: (2026)

HomeRobot: Open-Vocabulary Mobile Manipulation
by: Yenamandra, Sriram, et al.
Published: (2023)

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge
by: Yenamandra, Sriram, et al.
Published: (2024)

HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)
by: Kuzma, Volodymyr, et al.
Published: (2024)

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
by: Matsuda, Kazuki, et al.
Published: (2024)

GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
by: Katsumata, Kei, et al.
Published: (2025)

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation
by: Dong, Runpei, et al.
Published: (2026)

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)

Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving
by: Englmeier, Stefan, et al.
Published: (2025)

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
by: Zhi, Peiyuan, et al.
Published: (2024)

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)

ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)

Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
by: Yang, Yandan, et al.
Published: (2026)

Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting
by: Shorinwa, Ola, et al.
Published: (2024)

Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning
by: Sakaguchi, Taichi, et al.
Published: (2024)

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation
by: Werby, Abdelrhman, et al.
Published: (2024)

RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
by: Matsuda, Kazuki, et al.
Published: (2025)

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)

AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
by: Takanami, Ryosuke, et al.
Published: (2025)

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
by: Cui, Jieming, et al.
Published: (2024)

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
by: Cui, Jieming, et al.
Published: (2025)

RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation
by: Liu, Fanfan, et al.
Published: (2024)

Open-Vocabulary Online Semantic Mapping for SLAM
by: Martins, Tomas Berriel, et al.
Published: (2024)

LOVON: Legged Open-Vocabulary Object Navigator
by: Peng, Daojie, et al.
Published: (2025)

Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
by: Kong, Lingdong, et al.
Published: (2024)

WildOS: Open-Vocabulary Object Search in the Wild
by: Shah, Hardik, et al.
Published: (2026)