Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2501.08907 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866910007926194176 |
|---|---|
| author | Han, Xinchen Afifi, Hossam Marot, Michel |
| author_facet | Han, Xinchen Afifi, Hossam Marot, Michel |
| contents | Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the expectile regression and in-sample learning framework, guarantees monotonic policy improvement, and introduces a progressively more rigorous criterion for advantageous actions. Experiments on D4RL and NeoRL2 benchmarks demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_08907 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning Han, Xinchen Afifi, Hossam Marot, Michel Machine Learning Artificial Intelligence Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the expectile regression and in-sample learning framework, guarantees monotonic policy improvement, and introduces a progressively more rigorous criterion for advantageous actions. Experiments on D4RL and NeoRL2 benchmarks demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall. |
| title | PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2501.08907 |