Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Qian, Yu, Chao, Liu, Zongkai, Wu, Zifan
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2401.02244
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911748047503360
author	Lin, Qian Yu, Chao Liu, Zongkai Wu, Zifan
author_facet	Lin, Qian Yu, Chao Liu, Zongkai Wu, Zifan
contents	In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_02244
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Policy-regularized Offline Multi-objective Reinforcement Learning Lin, Qian Yu, Chao Liu, Zongkai Wu, Zifan Machine Learning Artificial Intelligence In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.
title	Policy-regularized Offline Multi-objective Reinforcement Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2401.02244

Similar Items