Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tagle, Augusto, Ruiz-del-Solar, Javier, Tobar, Felipe
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2505.18345
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866911334115835904
author	Tagle, Augusto Ruiz-del-Solar, Javier Tobar, Felipe
author_facet	Tagle, Augusto Ruiz-del-Solar, Javier Tobar, Felipe
contents	Offline reinforcement learning (RL) recovers the optimal policy $π$ given historical observations of an agent. In practice, $π$ is modeled as a weighted version of the agent's behavior policy $μ$, using a weight function $w$ working as a critic of the agent's behavior. Though recent approaches to offline RL based on diffusion models have exhibited promising results, the computation of the required scores is challenging due to their dependence on the unknown $w$. In this work, we alleviate this issue by constructing a diffusion over both the actions and the weights. With the proposed setting, the required scores are directly obtained from the diffusion model without learning extra networks. Our main conceptual contribution is a novel guidance method, where guidance (which is a function of $w$) comes from the same diffusion model, therefore, our proposal is termed Self-Weighted Guidance (SWG). We show that SWG generates samples from the desired distribution on toy examples and performs on par with state-of-the-art methods on D4RL's challenging environments, while maintaining a streamlined training pipeline. We further validate SWG through ablation studies on weight formulations and scalability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_18345
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Diffusion Self-Weighted Guidance for Offline Reinforcement Learning Tagle, Augusto Ruiz-del-Solar, Javier Tobar, Felipe Machine Learning Offline reinforcement learning (RL) recovers the optimal policy $π$ given historical observations of an agent. In practice, $π$ is modeled as a weighted version of the agent's behavior policy $μ$, using a weight function $w$ working as a critic of the agent's behavior. Though recent approaches to offline RL based on diffusion models have exhibited promising results, the computation of the required scores is challenging due to their dependence on the unknown $w$. In this work, we alleviate this issue by constructing a diffusion over both the actions and the weights. With the proposed setting, the required scores are directly obtained from the diffusion model without learning extra networks. Our main conceptual contribution is a novel guidance method, where guidance (which is a function of $w$) comes from the same diffusion model, therefore, our proposal is termed Self-Weighted Guidance (SWG). We show that SWG generates samples from the desired distribution on toy examples and performs on par with state-of-the-art methods on D4RL's challenging environments, while maintaining a streamlined training pipeline. We further validate SWG through ablation studies on weight formulations and scalability.
title	Diffusion Self-Weighted Guidance for Offline Reinforcement Learning
topic	Machine Learning
url	https://arxiv.org/abs/2505.18345

Ähnliche Einträge