Saved in:
Bibliographic Details
Main Authors: McCarthy, Robert, Tan, Daniel C. H., Schmidt, Dominik, Acero, Fernando, Herr, Nathan, Du, Yilun, Thuruthel, Thomas G., Li, Zhibin
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.19664
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913956765892608
author McCarthy, Robert
Tan, Daniel C. H.
Schmidt, Dominik
Acero, Fernando
Herr, Nathan
Du, Yilun
Thuruthel, Thomas G.
Li, Zhibin
author_facet McCarthy, Robert
Tan, Daniel C. H.
Schmidt, Dominik
Acero, Fernando
Herr, Nathan
Du, Yilun
Thuruthel, Thomas G.
Li, Zhibin
contents Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.
format Preprint
id arxiv_https___arxiv_org_abs_2404_19664
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Towards Generalist Robot Learning from Internet Video: A Survey
McCarthy, Robert
Tan, Daniel C. H.
Schmidt, Dominik
Acero, Fernando
Herr, Nathan
Du, Yilun
Thuruthel, Thomas G.
Li, Zhibin
Robotics
Machine Learning
Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing. Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data. Learning from videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video. This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots. This survey systematically examines the emerging field of LfV. We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data. Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training. The survey concludes with a critical discussion of future opportunities. Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models. Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.
title Towards Generalist Robot Learning from Internet Video: A Survey
topic Robotics
Machine Learning
url https://arxiv.org/abs/2404.19664