Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tomar, Manan, Hansen-Estruch, Philippe, Bachman, Philip, Lamb, Alex, Langford, John, Taylor, Matthew E., Levine, Sergey
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.09533
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866907978199728128
author	Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey
author_facet	Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey
contents	We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_09533
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Video Occupancy Models Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey Computer Vision and Pattern Recognition Artificial Intelligence We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}.
title	Video Occupancy Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2407.09533

Similar Items