Saved in:
| Main Authors: | Yashima, Daichi, Korekata, Ryosuke, Sugiura, Komei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.16576 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)
by: Korekata, Ryosuke, et al.
Published: (2025)
DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)
by: Korekata, Ryosuke, et al.
Published: (2024)
NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
by: Amemiya, Kanon, et al.
Published: (2026)
by: Amemiya, Kanon, et al.
Published: (2026)
Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)
by: Goko, Miyu, et al.
Published: (2024)
Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)
by: Kambara, Motonari, et al.
Published: (2024)
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025)
by: Katsumata, Kei, et al.
Published: (2025)
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)
by: Nishimura, Takayuki, et al.
Published: (2024)
MLLM-as-a-Judge Exhibits Model Preference Bias
by: Koyama, Shuitsu, et al.
Published: (2026)
by: Koyama, Shuitsu, et al.
Published: (2026)
ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
by: Yashima, Daichi, et al.
Published: (2026)
by: Yashima, Daichi, et al.
Published: (2026)
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)
by: Wada, Yuiga, et al.
Published: (2024)
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
by: Yashima, Daichi, et al.
Published: (2026)
by: Yashima, Daichi, et al.
Published: (2026)
HomeRobot: Open-Vocabulary Mobile Manipulation
by: Yenamandra, Sriram, et al.
Published: (2023)
by: Yenamandra, Sriram, et al.
Published: (2023)
Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge
by: Yenamandra, Sriram, et al.
Published: (2024)
by: Yenamandra, Sriram, et al.
Published: (2024)
HomeRobot Open Vocabulary Mobile Manipulation Challenge 2023 Participant Report (Team KuzHum)
by: Kuzma, Volodymyr, et al.
Published: (2024)
by: Kuzma, Volodymyr, et al.
Published: (2024)
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)
by: Wang, Zhaowei, et al.
Published: (2024)
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
by: Matsuda, Kazuki, et al.
Published: (2024)
by: Matsuda, Kazuki, et al.
Published: (2024)
GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
by: Katsumata, Kei, et al.
Published: (2025)
by: Katsumata, Kei, et al.
Published: (2025)
Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation
by: Dong, Runpei, et al.
Published: (2026)
by: Dong, Runpei, et al.
Published: (2026)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)
by: Salzmann, Tim, et al.
Published: (2024)
Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving
by: Englmeier, Stefan, et al.
Published: (2025)
by: Englmeier, Stefan, et al.
Published: (2025)
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
by: Zhi, Peiyuan, et al.
Published: (2024)
by: Zhi, Peiyuan, et al.
Published: (2024)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)
by: Wada, Yuiga, et al.
Published: (2025)
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)
by: Guillen-Perez, Antonio
Published: (2025)
ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
by: Yang, Yandan, et al.
Published: (2026)
by: Yang, Yandan, et al.
Published: (2026)
Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting
by: Shorinwa, Ola, et al.
Published: (2024)
by: Shorinwa, Ola, et al.
Published: (2024)
Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning
by: Sakaguchi, Taichi, et al.
Published: (2024)
by: Sakaguchi, Taichi, et al.
Published: (2024)
Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation
by: Werby, Abdelrhman, et al.
Published: (2024)
by: Werby, Abdelrhman, et al.
Published: (2024)
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
by: Matsuda, Kazuki, et al.
Published: (2025)
by: Matsuda, Kazuki, et al.
Published: (2025)
RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)
by: Li, Huiqiong, et al.
Published: (2026)
AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
by: Takanami, Ryosuke, et al.
Published: (2025)
by: Takanami, Ryosuke, et al.
Published: (2025)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
by: Cui, Jieming, et al.
Published: (2024)
by: Cui, Jieming, et al.
Published: (2024)
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
by: Cui, Jieming, et al.
Published: (2025)
by: Cui, Jieming, et al.
Published: (2025)
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation
by: Liu, Fanfan, et al.
Published: (2024)
by: Liu, Fanfan, et al.
Published: (2024)
Open-Vocabulary Online Semantic Mapping for SLAM
by: Martins, Tomas Berriel, et al.
Published: (2024)
by: Martins, Tomas Berriel, et al.
Published: (2024)
LOVON: Legged Open-Vocabulary Object Navigator
by: Peng, Daojie, et al.
Published: (2025)
by: Peng, Daojie, et al.
Published: (2025)
Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)
by: Zhou, Qinhong, et al.
Published: (2025)
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
by: Kong, Lingdong, et al.
Published: (2024)
by: Kong, Lingdong, et al.
Published: (2024)
WildOS: Open-Vocabulary Object Search in the Wild
by: Shah, Hardik, et al.
Published: (2026)
by: Shah, Hardik, et al.
Published: (2026)
Similar Items
-
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025) -
DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024) -
NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
by: Amemiya, Kanon, et al.
Published: (2026) -
Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024) -
Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)