Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.17813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911876336582656 |
|---|---|
| author | Yang, Greg Simon, James B. Bernstein, Jeremy |
| author_facet | Yang, Greg Simon, James B. Bernstein, Jeremy |
| contents | The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2310_17813 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | A Spectral Condition for Feature Learning Yang, Greg Simon, James B. Bernstein, Jeremy Machine Learning The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks. |
| title | A Spectral Condition for Feature Learning |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2310.17813 |