:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Korekata, Ryosuke, Kaneda, Kanta, Nagashima, Shunya, Imai, Yuto, Sugiura, Komei
Format:	Preprint
Published:	2024
Subjects:	Robotics Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.07910
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)

Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)

Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
by: Nishimura, Takayuki, et al.
Published: (2024)

Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories
by: Kambara, Motonari, et al.
Published: (2024)

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
by: Katsumata, Kei, et al.
Published: (2025)

Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images
by: Nagashima, Shunya, et al.
Published: (2025)

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
by: Wada, Yuiga, et al.
Published: (2024)

Task Success Prediction for Open-Vocabulary Manipulation Based on Multi-Level Aligned Representations
by: Goko, Miyu, et al.
Published: (2024)

FLARE-SSM: Deep State Space Models with Influence-Balanced Loss for 72-Hour Solar Flare Prediction
by: Takagi, Yusuke, et al.
Published: (2025)

Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding
by: Suzuki, Shuntaro, et al.
Published: (2025)

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
by: Amemiya, Kanon, et al.
Published: (2026)

Co-Scale Cross-Attentional Transformer for Rearrangement Target Detection
by: Matsuo, Haruka, et al.
Published: (2024)

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation
by: Kambara, Motonari, et al.
Published: (2026)

Pre-Manipulation Alignment Prediction with Parallel Deep State-Space and Transformer Models
by: Kambara, Motonari, et al.
Published: (2025)

GENNAV: Polygon Mask Generation for Generalized Referring Navigable Regions
by: Katsumata, Kei, et al.
Published: (2025)

LOVON: Legged Open-Vocabulary Object Navigator
by: Peng, Daojie, et al.
Published: (2025)

WildOS: Open-Vocabulary Object Search in the Wild
by: Shah, Hardik, et al.
Published: (2026)

Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking
by: Ishaq, Ayesha, et al.
Published: (2024)

Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
by: Wang, Jiawei, et al.
Published: (2025)

OVGrasp: Open-Vocabulary Grasping Assistance via Multimodal Intent Detection
by: Hu, Chen, et al.
Published: (2025)

Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
by: Qiu, Xiaowen, et al.
Published: (2025)

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
by: Ma, Ji, et al.
Published: (2024)

WoMAP: World Models For Embodied Open-Vocabulary Object Localization
by: Yin, Tenny, et al.
Published: (2025)

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding
by: Deng, Yinan, et al.
Published: (2024)

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
by: Cai, Junhao, et al.
Published: (2024)

DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
by: Jiang, Jiajun, et al.
Published: (2025)

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
by: Laina, Sebastián Barbas, et al.
Published: (2025)

ZINA: Multimodal Fine-grained Hallucination Detection and Editing
by: Wada, Yuiga, et al.
Published: (2025)

Target-Oriented Object Grasping via Multimodal Human Guidance
by: Xie, Pengwei, et al.
Published: (2024)

Open-Vocabulary Online Semantic Mapping for SLAM
by: Martins, Tomas Berriel, et al.
Published: (2024)

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
by: Kong, Lingdong, et al.
Published: (2024)

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation
by: Jiang, Haochen, et al.
Published: (2024)

HomeRobot: Open-Vocabulary Mobile Manipulation
by: Yenamandra, Sriram, et al.
Published: (2023)

NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving
by: Luo, Kai, et al.
Published: (2026)

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
by: Wu, Yanmin, et al.
Published: (2024)

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
by: Yashima, Daichi, et al.
Published: (2026)

OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving
by: Yan, Tianyi, et al.
Published: (2024)

Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
by: Abdalwhab, Abdalwhab, et al.
Published: (2025)

Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking
by: Pätzold, Bastian, et al.
Published: (2025)