Guardado en:
Detalles Bibliográficos
Autores principales: Pan, Leyan, Cao, Xinyuan
Formato: Preprint
Publicado: 2023
Materias:
Acceso en línea:https://arxiv.org/abs/2309.04644
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866914936952717312
author Pan, Leyan
Cao, Xinyuan
author_facet Pan, Leyan
Cao, Xinyuan
contents Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.
format Preprint
id arxiv_https___arxiv_org_abs_2309_04644
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
Pan, Leyan
Cao, Xinyuan
Machine Learning
Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.
title Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
topic Machine Learning
url https://arxiv.org/abs/2309.04644