Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Agarwal, Aditya, Singh, Gaurav, Sen, Bipasha, Lozano-Pérez, Tomás, Kaelbling, Leslie Pack
Format:	Preprint
Published:	2024
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2410.23643
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911254686203904
author	Agarwal, Aditya Singh, Gaurav Sen, Bipasha Lozano-Pérez, Tomás Kaelbling, Leslie Pack
author_facet	Agarwal, Aditya Singh, Gaurav Sen, Bipasha Lozano-Pérez, Tomás Kaelbling, Leslie Pack
contents	Careful robot manipulation in every-day cluttered environments requires an accurate understanding of the 3D scene, in order to grasp and place objects stably and reliably and to avoid colliding with other objects. In general, we must construct such a 3D interpretation of a complex scene based on limited input, such as a single RGB-D image. We describe SceneComplete, a system for constructing a complete, segmented, 3D model of a scene from a single view. SceneComplete is a novel pipeline for composing general-purpose pretrained perception modules (vision-language, segmentation, image-inpainting, image-to-3D, visual-descriptors and pose-estimation) to obtain highly accurate results. We demonstrate its accuracy and effectiveness with respect to ground-truth models in a large benchmark dataset and show that its accurate whole-object reconstruction enables robust grasp proposal generation, including for a dexterous hand. We release the code and additional results on our website.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_23643
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SceneComplete: Open-World 3D Scene Completion in Cluttered Real World Environments for Robot Manipulation Agarwal, Aditya Singh, Gaurav Sen, Bipasha Lozano-Pérez, Tomás Kaelbling, Leslie Pack Robotics Careful robot manipulation in every-day cluttered environments requires an accurate understanding of the 3D scene, in order to grasp and place objects stably and reliably and to avoid colliding with other objects. In general, we must construct such a 3D interpretation of a complex scene based on limited input, such as a single RGB-D image. We describe SceneComplete, a system for constructing a complete, segmented, 3D model of a scene from a single view. SceneComplete is a novel pipeline for composing general-purpose pretrained perception modules (vision-language, segmentation, image-inpainting, image-to-3D, visual-descriptors and pose-estimation) to obtain highly accurate results. We demonstrate its accuracy and effectiveness with respect to ground-truth models in a large benchmark dataset and show that its accurate whole-object reconstruction enables robust grasp proposal generation, including for a dexterous hand. We release the code and additional results on our website.
title	SceneComplete: Open-World 3D Scene Completion in Cluttered Real World Environments for Robot Manipulation
topic	Robotics
url	https://arxiv.org/abs/2410.23643

Similar Items