Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Macfarlane, Matthew V, Toledo, Edan, Byrne, Donal, Duckworth, Paul, Laterre, Alexandre
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2402.07963
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913568692109312
author	Macfarlane, Matthew V Toledo, Edan Byrne, Donal Duckworth, Paul Laterre, Alexandre
author_facet	Macfarlane, Matthew V Toledo, Edan Byrne, Donal Duckworth, Paul Laterre, Alexandre
contents	Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_07963
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	SPO: Sequential Monte Carlo Policy Optimisation Macfarlane, Matthew V Toledo, Edan Byrne, Donal Duckworth, Paul Laterre, Alexandre Artificial Intelligence Machine Learning Leveraging planning during learning and decision-making is central to the long-term development of intelligent agents. Recent works have successfully combined tree-based search methods and self-play learning mechanisms to this end. However, these methods typically face scaling challenges due to the sequential nature of their search. While practical engineering solutions can partly overcome this, they often result in a negative impact on performance. In this paper, we introduce SPO: Sequential Monte Carlo Policy Optimisation, a model-based reinforcement learning algorithm grounded within the Expectation Maximisation (EM) framework. We show that SPO provides robust policy improvement and efficient scaling properties. The sample-based search makes it directly applicable to both discrete and continuous action spaces without modifications. We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines across both continuous and discrete environments. Furthermore, the parallel nature of SPO's search enables effective utilisation of hardware accelerators, yielding favourable scaling laws.
title	SPO: Sequential Monte Carlo Policy Optimisation
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2402.07963

Similar Items