Saved in:
Bibliographic Details
Main Authors: Hirsch, Christian, Willhalm, Daniel
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.09310
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909454151188480
author Hirsch, Christian
Willhalm, Daniel
author_facet Hirsch, Christian
Willhalm, Daniel
contents We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.
format Preprint
id arxiv_https___arxiv_org_abs_2403_09310
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Large deviations of one-hidden-layer neural networks
Hirsch, Christian
Willhalm, Daniel
Probability
60F10, 68T07, 34F05
We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.
title Large deviations of one-hidden-layer neural networks
topic Probability
60F10, 68T07, 34F05
url https://arxiv.org/abs/2403.09310