Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sobal, Vlad, Zhang, Wancong, Cho, Kyunghyun, Balestriero, Randall, Rudner, Tim G. J., LeCun, Yann
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.14819
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909874299863040
author	Sobal, Vlad Zhang, Wancong Cho, Kyunghyun Balestriero, Randall Rudner, Tim G. J. LeCun, Yann
author_facet	Sobal, Vlad Zhang, Wancong Cho, Kyunghyun Balestriero, Randall Rudner, Tim G. J. LeCun, Yann
contents	A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative strengths in the offline setting - where agents must learn from reward-free trajectories - remain underexplored. In this work, we systematically evaluate RL and control-based methods on a suite of navigation tasks, using offline datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot methods. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning. We investigate how factors such as data diversity, trajectory quality, and environment variability influence the performance of these approaches. Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts and is more data-efficient, while achieving trajectory stitching performance comparable to leading model-free methods. Notably, planning with a latent dynamics model proves to be a strong approach for handling suboptimal offline data and adapting to diverse environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_14819
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models Sobal, Vlad Zhang, Wancong Cho, Kyunghyun Balestriero, Randall Rudner, Tim G. J. LeCun, Yann Machine Learning A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative strengths in the offline setting - where agents must learn from reward-free trajectories - remain underexplored. In this work, we systematically evaluate RL and control-based methods on a suite of navigation tasks, using offline datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot methods. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning. We investigate how factors such as data diversity, trajectory quality, and environment variability influence the performance of these approaches. Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts and is more data-efficient, while achieving trajectory stitching performance comparable to leading model-free methods. Notably, planning with a latent dynamics model proves to be a strong approach for handling suboptimal offline data and adapting to diverse environments.
title	Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
topic	Machine Learning
url	https://arxiv.org/abs/2502.14819

Similar Items