Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pemberton, Joseph, Costa, Rui Ponte
Format:	Preprint
Published:	2024
Subjects:	Machine Learning 68T07
Online Access:	https://arxiv.org/abs/2401.07044
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929208842780672
author	Pemberton, Joseph Costa, Rui Ponte
author_facet	Pemberton, Joseph Costa, Rui Ponte
contents	Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate $\mathrm{TD}(λ)$ in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate $BP(λ)$. As in accumulate $\mathrm{TD}(λ)$, we show analytically that accumulate $\mathrm{BP}(λ)$ can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate $\mathrm{BP}(λ)$ as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_07044
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	BP(λ): Online Learning via Synthetic Gradients Pemberton, Joseph Costa, Rui Ponte Machine Learning 68T07 Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate $\mathrm{TD}(λ)$ in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate $BP(λ)$. As in accumulate $\mathrm{TD}(λ)$, we show analytically that accumulate $\mathrm{BP}(λ)$ can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate $\mathrm{BP}(λ)$ as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients.
title	BP(λ): Online Learning via Synthetic Gradients
topic	Machine Learning 68T07
url	https://arxiv.org/abs/2401.07044

Similar Items