Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	McCarthy, Robert, Tan, Daniel C. H., Schmidt, Dominik, Acero, Fernando, Herr, Nathan, Du, Yilun, Thuruthel, Thomas G., Li, Zhibin
Format:	Preprint
Published:	2024
Subjects:	Robotics Machine Learning
Online Access:	https://arxiv.org/abs/2404.19664
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913956765892608
author	McCarthy, Robert Tan, Daniel C. H. Schmidt, Dominik Acero, Fernando Herr, Nathan Du, Yilun Thuruthel, Thomas G. Li, Zhibin
author_facet	McCarthy, Robert Tan, Daniel C. H. Schmidt, Dominik Acero, Fernando Herr, Nathan Du, Yilun Thuruthel, Thomas G. Li, Zhibin
contents	Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_19664
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Towards Generalist Robot Learning from Internet Video: A Survey McCarthy, Robert Tan, Daniel C. H. Schmidt, Dominik Acero, Fernando Herr, Nathan Du, Yilun Thuruthel, Thomas G. Li, Zhibin Robotics Machine Learning Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.
title	Towards Generalist Robot Learning from Internet Video: A Survey
topic	Robotics Machine Learning
url	https://arxiv.org/abs/2404.19664

Similar Items