Saved in:
Bibliographic Details
Main Authors: Wagner, Dominik, Khajwal, Basim, Ong, C. -H. Luke
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.11752
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917594062127104
author Wagner, Dominik
Khajwal, Basim
Ong, C. -H. Luke
author_facet Wagner, Dominik
Khajwal, Basim
Ong, C. -H. Luke
contents It is well-known that the reparameterisation gradient estimator, which exhibits low variance in practice, is biased for non-differentiable models. This may compromise correctness of gradient-based optimisation methods such as stochastic gradient descent (SGD). We introduce a simple syntactic framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings for which the reparameterisation gradient estimator is unbiased. Our main contribution is a novel variant of SGD, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective. Our empirical evaluation reveals benefits over the state of the art: our approach is simple, fast, stable and attains orders of magnitude reduction in work-normalised variance.
format Preprint
id arxiv_https___arxiv_org_abs_2402_11752
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
Wagner, Dominik
Khajwal, Basim
Ong, C. -H. Luke
Machine Learning
Artificial Intelligence
Optimization and Control
It is well-known that the reparameterisation gradient estimator, which exhibits low variance in practice, is biased for non-differentiable models. This may compromise correctness of gradient-based optimisation methods such as stochastic gradient descent (SGD). We introduce a simple syntactic framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings for which the reparameterisation gradient estimator is unbiased. Our main contribution is a novel variant of SGD, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective. Our empirical evaluation reveals benefits over the state of the art: our approach is simple, fast, stable and attains orders of magnitude reduction in work-normalised variance.
title Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
topic Machine Learning
Artificial Intelligence
Optimization and Control
url https://arxiv.org/abs/2402.11752