Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Paudel, Abhishek, Stein, Gregory J.
Format:	Preprint
Published:	2023
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2304.01094
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911750727663616
author	Paudel, Abhishek Stein, Gregory J.
author_facet	Paudel, Abhishek Stein, Gregory J.
contents	We present a novel approach for fast and reliable policy selection for navigation in partial maps. Leveraging the recent learning-augmented model-based Learning over Subgoals Planning (LSP) abstraction to plan, our robot reuses data collected during navigation to evaluate how well other alternative policies could have performed via a procedure we call offline alt-policy replay. Costs from offline alt-policy replay constrain policy selection among the LSP-based policies during deployment, allowing for improvements in convergence speed, cumulative regret and average navigation cost. With only limited prior knowledge about the nature of unseen environments, we achieve at least 67% and as much as 96% improvements on cumulative regret over the baseline bandit approach in our experiments in simulated maze and office-like environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2304_01094
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction Paudel, Abhishek Stein, Gregory J. Robotics We present a novel approach for fast and reliable policy selection for navigation in partial maps. Leveraging the recent learning-augmented model-based Learning over Subgoals Planning (LSP) abstraction to plan, our robot reuses data collected during navigation to evaluate how well other alternative policies could have performed via a procedure we call offline alt-policy replay. Costs from offline alt-policy replay constrain policy selection among the LSP-based policies during deployment, allowing for improvements in convergence speed, cumulative regret and average navigation cost. With only limited prior knowledge about the nature of unseen environments, we achieve at least 67% and as much as 96% improvements on cumulative regret over the baseline bandit approach in our experiments in simulated maze and office-like environments.
title	Data-Efficient Policy Selection for Navigation in Partial Maps via Subgoal-Based Abstraction
topic	Robotics
url	https://arxiv.org/abs/2304.01094

Similar Items