Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Plizzari, Chiara, Goel, Shubham, Perrett, Toby, Chalk, Jacob, Kanazawa, Angjoo, Damen, Dima
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.05072
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917898669260800
author	Plizzari, Chiara Goel, Shubham Perrett, Toby Chalk, Jacob Kanazawa, Angjoo Damen, Dima
author_facet	Plizzari, Chiara Goel, Shubham Perrett, Toby Chalk, Jacob Kanazawa, Angjoo Damen, Dima
contents	As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of their sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We introduce a simple but effective approach to address this challenging problem, called Lift, Match, and Keep (LMK). LMK lifts partial 2D observations to 3D world coordinates, matches them over time using visual appearance, 3D location and interactions to form object tracks, and keeps these object tracks even when they go out-of-view of the camera. We benchmark LMK on 100 long videos from EPIC-KITCHENS. Our results demonstrate that spatial cognition is critical for correctly locating objects over short and long time scales. E.g., for one long egocentric video, we estimate the 3D location of 50 active objects. After 120 seconds, 57% of the objects are correctly localised by LMK, compared to just 33% by a recent 3D method for egocentric videos and 17% by a general 2D tracking method.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_05072
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind Plizzari, Chiara Goel, Shubham Perrett, Toby Chalk, Jacob Kanazawa, Angjoo Damen, Dima Computer Vision and Pattern Recognition As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of their sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We introduce a simple but effective approach to address this challenging problem, called Lift, Match, and Keep (LMK). LMK lifts partial 2D observations to 3D world coordinates, matches them over time using visual appearance, 3D location and interactions to form object tracks, and keeps these object tracks even when they go out-of-view of the camera. We benchmark LMK on 100 long videos from EPIC-KITCHENS. Our results demonstrate that spatial cognition is critical for correctly locating objects over short and long time scales. E.g., for one long egocentric video, we estimate the 3D location of 50 active objects. After 120 seconds, 57% of the objects are correctly localised by LMK, compared to just 33% by a recent 3D method for egocentric videos and 17% by a general 2D tracking method.
title	Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.05072

Similar Items