Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	He, Bowei, Sun, Zexu, Liu, Jinxin, Zhang, Shuai, Chen, Xu, Ma, Chen
Formato:	Preprint
Publicado:	2023
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2310.04706
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866911742925209600
author	He, Bowei Sun, Zexu Liu, Jinxin Zhang, Shuai Chen, Xu Ma, Chen
author_facet	He, Bowei Sun, Zexu Liu, Jinxin Zhang, Shuai Chen, Xu Ma, Chen
contents	In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2310_04706
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Offline Imitation Learning with Variational Counterfactual Reasoning He, Bowei Sun, Zexu Liu, Jinxin Zhang, Shuai Chen, Xu Ma, Chen Machine Learning In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.
title	Offline Imitation Learning with Variational Counterfactual Reasoning
topic	Machine Learning
url	https://arxiv.org/abs/2310.04706

Ejemplares similares