Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Wanpeng, Xiao, Xi, Yao, Yao, Chen, Mingzhe, Luo, Dijun
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2108.01295
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917655156359168
author	Zhang, Wanpeng Xiao, Xi Yao, Yao Chen, Mingzhe Luo, Dijun
author_facet	Zhang, Wanpeng Xiao, Xi Yao, Yao Chen, Mingzhe Luo, Dijun
contents	Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.
format	Preprint
id	arxiv_https___arxiv_org_abs_2108_01295
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning Zhang, Wanpeng Xiao, Xi Yao, Yao Chen, Mingzhe Luo, Dijun Machine Learning Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.
title	MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning
topic	Machine Learning
url	https://arxiv.org/abs/2108.01295

Similar Items