Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fang, Irving, Chen, Yuzhong, Wang, Yifan, Zhang, Jianghan, Zhang, Qiushi, Xu, Jiali, He, Xibo, Gao, Weibo, Su, Hao, Li, Yiming, Feng, Chen
Format:	Preprint
Published:	2024
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2403.05046
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910358326738944
author	Fang, Irving Chen, Yuzhong Wang, Yifan Zhang, Jianghan Zhang, Qiushi Xu, Jiali He, Xibo Gao, Weibo Su, Hao Li, Yiming Feng, Chen
author_facet	Fang, Irving Chen, Yuzhong Wang, Yifan Zhang, Jianghan Zhang, Qiushi Xu, Jiali He, Xibo Gao, Weibo Su, Hao Li, Yiming Feng, Chen
contents	A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstream robotics tasks, especially given the increasing prevalence of headset devices. This study expands EgoPAT3D, the sole dataset dedicated to egocentric 3D action target prediction. We augment both its size and diversity, enhancing its potential for generalization. Moreover, we substantially enhance the baseline algorithm by introducing a large pre-trained model and human prior knowledge. Remarkably, our novel algorithm can now achieve superior prediction outcomes using solely RGB images, eliminating the previous need for 3D point clouds and IMU input. Furthermore, we deploy our enhanced baseline algorithm on a real-world robotic platform to illustrate its practical utility in straightforward HRI tasks. The demonstrations showcase the real-world applicability of our advancements and may inspire more HRI use cases involving egocentric vision. All code and data are open-sourced and can be found on the project website.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_05046
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction Fang, Irving Chen, Yuzhong Wang, Yifan Zhang, Jianghan Zhang, Qiushi Xu, Jiali He, Xibo Gao, Weibo Su, Hao Li, Yiming Feng, Chen Robotics A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstream robotics tasks, especially given the increasing prevalence of headset devices. This study expands EgoPAT3D, the sole dataset dedicated to egocentric 3D action target prediction. We augment both its size and diversity, enhancing its potential for generalization. Moreover, we substantially enhance the baseline algorithm by introducing a large pre-trained model and human prior knowledge. Remarkably, our novel algorithm can now achieve superior prediction outcomes using solely RGB images, eliminating the previous need for 3D point clouds and IMU input. Furthermore, we deploy our enhanced baseline algorithm on a real-world robotic platform to illustrate its practical utility in straightforward HRI tasks. The demonstrations showcase the real-world applicability of our advancements and may inspire more HRI use cases involving egocentric vision. All code and data are open-sourced and can be found on the project website.
title	EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction
topic	Robotics
url	https://arxiv.org/abs/2403.05046

Similar Items