Saved in:
Bibliographic Details
Main Authors: Traub, Manuel, Becker, Frederic, Sauter, Adrian, Otte, Sebastian, Butz, Martin V.
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2310.10410
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913224188755968
author Traub, Manuel
Becker, Frederic
Sauter, Adrian
Otte, Sebastian
Butz, Martin V.
author_facet Traub, Manuel
Becker, Frederic
Sauter, Adrian
Otte, Sebastian
Butz, Martin V.
contents Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system's well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2310_10410
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Loci-Segmented: Improving Scene Segmentation Learning
Traub, Manuel
Becker, Frederic
Sauter, Adrian
Otte, Sebastian
Butz, Martin V.
Computer Vision and Pattern Recognition
Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system's well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.
title Loci-Segmented: Improving Scene Segmentation Learning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2310.10410