MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Li, Tianqin, Zhao, Junru, Jiang, Dunhan, Wu, Shenghao, Ramirez, Alan, Lee, Tai Sing
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2506.01201
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911585469988864
author	Li, Tianqin Zhao, Junru Jiang, Dunhan Wu, Shenghao Ramirez, Alan Lee, Tai Sing
author_facet	Li, Tianqin Zhao, Junru Jiang, Dunhan Wu, Shenghao Ramirez, Alan Lee, Tai Sing
contents	David Marr's seminal theory of human perception stipulates that visual processing is a multi-stage process, prioritizing the derivation of boundary and surface properties before forming semantic object representations. In contrast, contrastive representation learning frameworks typically bypass this explicit multi-stage approach, defining their objective as the direct learning of a semantic representation space for objects. While effective in general contexts, this approach sacrifices the inductive biases of vision, leading to slower convergence speed and learning shortcut resulting in texture bias. In this work, we demonstrate that leveraging Marr's multi-stage theory-by first constructing boundary and surface-level representations using perceptual constructs from early visual processing stages and subsequently training for object semantics-leads to 2x faster convergence on ResNet18, improved final representations on semantic segmentation, depth estimation, and object recognition, and enhanced robustness and out-of-distribution capability. Together, we propose a pretraining stage before the general contrastive representation pretraining to further enhance the final representation quality and reduce the overall convergence time via inductive bias from human vision systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_01201
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Perceptual Inductive Bias Is What You Need Before Contrastive Learning Li, Tianqin Zhao, Junru Jiang, Dunhan Wu, Shenghao Ramirez, Alan Lee, Tai Sing Computer Vision and Pattern Recognition David Marr's seminal theory of human perception stipulates that visual processing is a multi-stage process, prioritizing the derivation of boundary and surface properties before forming semantic object representations. In contrast, contrastive representation learning frameworks typically bypass this explicit multi-stage approach, defining their objective as the direct learning of a semantic representation space for objects. While effective in general contexts, this approach sacrifices the inductive biases of vision, leading to slower convergence speed and learning shortcut resulting in texture bias. In this work, we demonstrate that leveraging Marr's multi-stage theory-by first constructing boundary and surface-level representations using perceptual constructs from early visual processing stages and subsequently training for object semantics-leads to 2x faster convergence on ResNet18, improved final representations on semantic segmentation, depth estimation, and object recognition, and enhanced robustness and out-of-distribution capability. Together, we propose a pretraining stage before the general contrastive representation pretraining to further enhance the final representation quality and reduce the overall convergence time via inductive bias from human vision systems.
title	Perceptual Inductive Bias Is What You Need Before Contrastive Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.01201

Documenti analoghi