Saved in:
Bibliographic Details
Main Authors: Yang, Greg, Simon, James B., Bernstein, Jeremy
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2310.17813
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911876336582656
author Yang, Greg
Simon, James B.
Bernstein, Jeremy
author_facet Yang, Greg
Simon, James B.
Bernstein, Jeremy
contents The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.
format Preprint
id arxiv_https___arxiv_org_abs_2310_17813
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle A Spectral Condition for Feature Learning
Yang, Greg
Simon, James B.
Bernstein, Jeremy
Machine Learning
The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.
title A Spectral Condition for Feature Learning
topic Machine Learning
url https://arxiv.org/abs/2310.17813