Saved in:
Bibliographic Details
Main Authors: Gruhlke, Robert, Nouy, Anthony, Trunschke, Philipp
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03113
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915762558468096
author Gruhlke, Robert
Nouy, Anthony
Trunschke, Philipp
author_facet Gruhlke, Robert
Nouy, Anthony
Trunschke, Philipp
contents We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics, machine learning and physics-informed machine learning. A straightforward solution is to replace the exact objective with a Monte Carlo estimate before employing standard first-order methods like gradient descent, which yields the classical stochastic gradient descent method. But replacing the true objective with an estimate ensues a generalisation error. Rigorous bounds for this error typically require strong compactness and Lipschitz continuity assumptions while providing a very slow decay with sample size. To alleviate these issues, we propose a version of natural gradient descent that is based on optimal sampling methods. Under classical assumptions on the loss and the nonlinear model class, we prove that this scheme converges almost surely monotonically to a stationary point of the true objective. Under Polyak-Lojasiewicz-type conditions, this provides bounds for the generalisation error. As a remarkable result, we show that our stochastic optimisation scheme achieves the linear or exponential convergence rates of deterministic first order descent methods under suitable conditions.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03113
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Optimal sampling for stochastic and natural gradient descent
Gruhlke, Robert
Nouy, Anthony
Trunschke, Philipp
Optimization and Control
Statistics Theory
49M15, 93E24, 68T07, 90C30, 60H30
We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics, machine learning and physics-informed machine learning. A straightforward solution is to replace the exact objective with a Monte Carlo estimate before employing standard first-order methods like gradient descent, which yields the classical stochastic gradient descent method. But replacing the true objective with an estimate ensues a generalisation error. Rigorous bounds for this error typically require strong compactness and Lipschitz continuity assumptions while providing a very slow decay with sample size. To alleviate these issues, we propose a version of natural gradient descent that is based on optimal sampling methods. Under classical assumptions on the loss and the nonlinear model class, we prove that this scheme converges almost surely monotonically to a stationary point of the true objective. Under Polyak-Lojasiewicz-type conditions, this provides bounds for the generalisation error. As a remarkable result, we show that our stochastic optimisation scheme achieves the linear or exponential convergence rates of deterministic first order descent methods under suitable conditions.
title Optimal sampling for stochastic and natural gradient descent
topic Optimization and Control
Statistics Theory
49M15, 93E24, 68T07, 90C30, 60H30
url https://arxiv.org/abs/2402.03113