Enregistré dans:
Détails bibliographiques
Auteurs principaux: Han, Xinchen, Afifi, Hossam, Marot, Michel
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2501.08907
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866910007926194176
author Han, Xinchen
Afifi, Hossam
Marot, Michel
author_facet Han, Xinchen
Afifi, Hossam
Marot, Michel
contents Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the expectile regression and in-sample learning framework, guarantees monotonic policy improvement, and introduces a progressively more rigorous criterion for advantageous actions. Experiments on D4RL and NeoRL2 benchmarks demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall.
format Preprint
id arxiv_https___arxiv_org_abs_2501_08907
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
Han, Xinchen
Afifi, Hossam
Marot, Michel
Machine Learning
Artificial Intelligence
Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the expectile regression and in-sample learning framework, guarantees monotonic policy improvement, and introduces a progressively more rigorous criterion for advantageous actions. Experiments on D4RL and NeoRL2 benchmarks demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall.
title PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2501.08907