Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.09310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909454151188480 |
|---|---|
| author | Hirsch, Christian Willhalm, Daniel |
| author_facet | Hirsch, Christian Willhalm, Daniel |
| contents | We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2403_09310 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Large deviations of one-hidden-layer neural networks Hirsch, Christian Willhalm, Daniel Probability 60F10, 68T07, 34F05 We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles. |
| title | Large deviations of one-hidden-layer neural networks |
| topic | Probability 60F10, 68T07, 34F05 |
| url | https://arxiv.org/abs/2403.09310 |