Saved in:
Bibliographic Details
Main Authors: Khan, Mohammad Emtiyaz, Rue, Håvard
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2107.04562
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913381603082240
author Khan, Mohammad Emtiyaz
Rue, Håvard
author_facet Khan, Mohammad Emtiyaz
Rue, Håvard
contents We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
format Preprint
id arxiv_https___arxiv_org_abs_2107_04562
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle The Bayesian Learning Rule
Khan, Mohammad Emtiyaz
Rue, Håvard
Machine Learning
We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
title The Bayesian Learning Rule
topic Machine Learning
url https://arxiv.org/abs/2107.04562