Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.09533 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866907978199728128 |
|---|---|
| author | Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey |
| author_facet | Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey |
| contents | We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2407_09533 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Video Occupancy Models Tomar, Manan Hansen-Estruch, Philippe Bachman, Philip Lamb, Alex Langford, John Taylor, Matthew E. Levine, Sergey Computer Vision and Pattern Recognition Artificial Intelligence We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at \href{https://github.com/manantomar/video-occupancy-models}{\texttt{github.com/manantomar/video-occupancy-models}}. |
| title | Video Occupancy Models |
| topic | Computer Vision and Pattern Recognition Artificial Intelligence |
| url | https://arxiv.org/abs/2407.09533 |