Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Flageat, Manon, Lim, Bryan, Cully, Antoine
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2312.07178
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909078133932032
author	Flageat, Manon Lim, Bryan Cully, Antoine
author_facet	Flageat, Manon Lim, Bryan Cully, Antoine
contents	Many applications in Reinforcement Learning (RL) usually have noise or stochasticity present in the environment. Beyond their impact on learning, these uncertainties lead the exact same policy to perform differently, i.e. yield different return, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy reproducibility: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications. We highlight that existing procedures that only use the expected return are limited on two fronts: first an infinite number of return distributions with a wide range of performance-reproducibility trade-offs can have the same expected return, limiting its effectiveness when used for comparing policies; second, the expected return metric does not leave any room for practitioners to choose the best trade-off value for considered applications. In this work, we address these limitations by recommending the use of Lower Confidence Bound, a metric taken from Bayesian optimisation that provides the user with a preference parameter to choose a desired performance-reproducibility trade-off. We also formalise and quantify policy reproducibility, and demonstrate the benefit of our metrics using extensive experiments of popular RL algorithms on common uncertain RL tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2312_07178
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms Flageat, Manon Lim, Bryan Cully, Antoine Machine Learning Artificial Intelligence Many applications in Reinforcement Learning (RL) usually have noise or stochasticity present in the environment. Beyond their impact on learning, these uncertainties lead the exact same policy to perform differently, i.e. yield different return, from one roll-out to another. Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution. Our work defines this spread as the policy reproducibility: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications. We highlight that existing procedures that only use the expected return are limited on two fronts: first an infinite number of return distributions with a wide range of performance-reproducibility trade-offs can have the same expected return, limiting its effectiveness when used for comparing policies; second, the expected return metric does not leave any room for practitioners to choose the best trade-off value for considered applications. In this work, we address these limitations by recommending the use of Lower Confidence Bound, a metric taken from Bayesian optimisation that provides the user with a preference parameter to choose a desired performance-reproducibility trade-off. We also formalise and quantify policy reproducibility, and demonstrate the benefit of our metrics using extensive experiments of popular RL algorithms on common uncertain RL tasks.
title	Beyond Expected Return: Accounting for Policy Reproducibility when Evaluating Reinforcement Learning Algorithms
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2312.07178

Similar Items