Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nishimura, Takayuki, Kuyo, Katsuyuki, Kambara, Motonari, Sugiura, Komei
Format:	Preprint
Published:	2024
Subjects:	Robotics Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.00985
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911938819129344
author	Nishimura, Takayuki Kuyo, Katsuyuki Kambara, Motonari Sugiura, Komei
author_facet	Nishimura, Takayuki Kuyo, Katsuyuki Kambara, Motonari Sugiura, Komei
contents	We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same polygon, which leads to erroneous mask generation. In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions. We implement a novel loss function using optimal transport to prevent significant loss where the order of vertices differs but still represents the same polygon. To evaluate our approach, we constructed a new dataset based on the REVERIE dataset and Matterport3D dataset. The results demonstrated the effectiveness of the proposed method compared with existing mask generation methods. Remarkably, our best model achieved a +16.32% improvement on the dataset compared with a representative polygon-based method.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_00985
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models Nishimura, Takayuki Kuyo, Katsuyuki Kambara, Motonari Sugiura, Komei Robotics Computer Vision and Pattern Recognition We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same polygon, which leads to erroneous mask generation. In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions. We implement a novel loss function using optimal transport to prevent significant loss where the order of vertices differs but still represents the same polygon. To evaluate our approach, we constructed a new dataset based on the REVERIE dataset and Matterport3D dataset. The results demonstrated the effectiveness of the proposed method compared with existing mask generation methods. Remarkably, our best model achieved a +16.32% improvement on the dataset compared with a representative polygon-based method.
title	Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
topic	Robotics Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2407.00985

Similar Items