Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Khan, Mohammad Emtiyaz, Rue, Håvard
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2107.04562
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913381603082240
author	Khan, Mohammad Emtiyaz Rue, Håvard
author_facet	Khan, Mohammad Emtiyaz Rue, Håvard
contents	We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
format	Preprint
id	arxiv_https___arxiv_org_abs_2107_04562
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	The Bayesian Learning Rule Khan, Mohammad Emtiyaz Rue, Håvard Machine Learning We show that many machine-learning algorithms are specific instances of a single algorithm called the \emph{Bayesian learning rule}. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
title	The Bayesian Learning Rule
topic	Machine Learning
url	https://arxiv.org/abs/2107.04562

Similar Items