Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics, machine learning and physics-informed machine learning. A straightforward solution is to replace the exact objective with a Monte Carlo estimate before employing standard first-order methods like gradient descent, which yields the classical stochastic gradient descent method. But replacing the true objective with an estimate ensues a generalisation error. Rigorous bounds for this error typically require strong compactness and Lipschitz continuity assumptions while providing a very slow decay with sample size. To alleviate these issues, we propose a version of natural gradient descent that is based on optimal sampling methods. Under classical assumptions on the loss and the nonlinear model class, we prove that this scheme converges almost surely monotonically to a stationary point of the true objective. Under Polyak-Lojasiewicz-type conditions, this provides bounds for the generalisation error. As a remarkable result, we show that our stochastic optimisation scheme achieves the linear or exponential convergence rates of deterministic first order descent methods under suitable conditions.