Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Poland, Douglas, Saini, Amar
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition I.4
Online Access:	https://arxiv.org/abs/2402.01126
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911769627197440
author	Poland, Douglas Saini, Amar
author_facet	Poland, Douglas Saini, Amar
contents	Perception of the visually disjoint surfaces of our cluttered world as whole objects, physically distinct from those overlapping them, is a cognitive phenomenon called objectness that forms the basis of our visual perception. Shared by all vertebrates and present at birth in humans, it enables object-centric representation and reasoning about the visual world. We present a computational approach to objectness that leverages motion cues and spatio-temporal attention using a pair of supervised spatio-temporal R(2+1)U-Nets. The first network detects motion boundaries and classifies the pixels at those boundaries in terms of their local foreground-background sense. This motion boundary sense (MBS) information is passed, along with a spatio-temporal object attention cue, to an attentional surface perception (ASP) module which infers the form of the attended object over a sequence of frames and classifies its 'pixels' as visible or obscured. The spatial form of the attention cue is flexible, but it must loosely track the attended object which need not be visible. We demonstrate the ability of this simple but novel approach to infer objectness from phenomenology without object models, and show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake. We show that our data diversity and augmentation minimizes bias and facilitates transfer to real video. Finally, we describe how this computational objectness capability can grow in sophistication and anchor a robust modular video object perception framework.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_01126
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Seeing Objects in a Cluttered World: Computational Objectness from Motion in Video Poland, Douglas Saini, Amar Computer Vision and Pattern Recognition I.4 Perception of the visually disjoint surfaces of our cluttered world as whole objects, physically distinct from those overlapping them, is a cognitive phenomenon called objectness that forms the basis of our visual perception. Shared by all vertebrates and present at birth in humans, it enables object-centric representation and reasoning about the visual world. We present a computational approach to objectness that leverages motion cues and spatio-temporal attention using a pair of supervised spatio-temporal R(2+1)U-Nets. The first network detects motion boundaries and classifies the pixels at those boundaries in terms of their local foreground-background sense. This motion boundary sense (MBS) information is passed, along with a spatio-temporal object attention cue, to an attentional surface perception (ASP) module which infers the form of the attended object over a sequence of frames and classifies its 'pixels' as visible or obscured. The spatial form of the attention cue is flexible, but it must loosely track the attended object which need not be visible. We demonstrate the ability of this simple but novel approach to infer objectness from phenomenology without object models, and show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake. We show that our data diversity and augmentation minimizes bias and facilitates transfer to real video. Finally, we describe how this computational objectness capability can grow in sophistication and anchor a robust modular video object perception framework.
title	Seeing Objects in a Cluttered World: Computational Objectness from Motion in Video
topic	Computer Vision and Pattern Recognition I.4
url	https://arxiv.org/abs/2402.01126

Similar Items