Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Tianyu, Lin, Haitao, Yu, Junqiu, Fu, Yanwei
Format:	Preprint
Published:	2024
Subjects:	Robotics Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.07975
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910566412451840
author	Wang, Tianyu Lin, Haitao Yu, Junqiu Fu, Yanwei
author_facet	Wang, Tianyu Lin, Haitao Yu, Junqiu Fu, Yanwei
contents	This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their ability to physically interact with the environment. This is because the robot needs to locate the target object for manipulation within the physical workspace. To this end, we introduce an interactive robotic manipulation framework called Polaris, which integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. For precise manipulation, it is essential that such grounded vision models produce detailed object pose for the target object, rather than merely identifying pixels belonging to them in the image. Consequently, we propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline. This pipeline utilizes rendered synthetic data for training and is then transferred to real-world manipulation tasks. The real-world performance demonstrates the efficacy of our proposed pipeline and underscores its potential for extension to more general categories. Moreover, real-robot experiments have showcased the impressive performance of our framework in grasping and executing multiple manipulation tasks. This indicates its potential to generalize to scenarios beyond the tabletop. More information and video results are available here: https://star-uu-wang.github.io/Polaris/
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_07975
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models Wang, Tianyu Lin, Haitao Yu, Junqiu Fu, Yanwei Robotics Computation and Language Computer Vision and Pattern Recognition This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their ability to physically interact with the environment. This is because the robot needs to locate the target object for manipulation within the physical workspace. To this end, we introduce an interactive robotic manipulation framework called Polaris, which integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. For precise manipulation, it is essential that such grounded vision models produce detailed object pose for the target object, rather than merely identifying pixels belonging to them in the image. Consequently, we propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline. This pipeline utilizes rendered synthetic data for training and is then transferred to real-world manipulation tasks. The real-world performance demonstrates the efficacy of our proposed pipeline and underscores its potential for extension to more general categories. Moreover, real-robot experiments have showcased the impressive performance of our framework in grasping and executing multiple manipulation tasks. This indicates its potential to generalize to scenarios beyond the tabletop. More information and video results are available here: https://star-uu-wang.github.io/Polaris/
title	Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
topic	Robotics Computation and Language Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2408.07975

Similar Items