Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yang, Qianlan, Wang, Yu-Xiong
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2406.04323
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866929376291979264
author Yang, Qianlan
Wang, Yu-Xiong
author_facet Yang, Qianlan
Wang, Yu-Xiong
contents Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .
format Preprint
id arxiv_https___arxiv_org_abs_2406_04323
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Yang, Qianlan
Wang, Yu-Xiong
Machine Learning
Artificial Intelligence
Computer Vision and Pattern Recognition
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .
title ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
topic Machine Learning
Artificial Intelligence
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2406.04323