Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xiao, Jiang, Ruoxi, Gao, William, Willett, Rebecca, Maire, Michael
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.10947
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918520379408384
author	Zhang, Xiao Jiang, Ruoxi Gao, William Willett, Rebecca Maire, Michael
author_facet	Zhang, Xiao Jiang, Ruoxi Gao, William Willett, Rebecca Maire, Michael
contents	We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_10947
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Residual Connections Harm Generative Representation Learning Zhang, Xiao Jiang, Ruoxi Gao, William Willett, Rebecca Maire, Michael Computer Vision and Pattern Recognition We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.
title	Residual Connections Harm Generative Representation Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.10947

Similar Items