Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Six, Valentin, Panse, Frederik, Fajeau, Mathis, Da Costa, Lancelot, Sharma, Mridul, Amayuelas, Alfonso, Xiao, Tim Z., Hyland, David, Hennig, Philipp, Schölkopf, Bernhard
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.13740
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911681702002688
author	Six, Valentin Panse, Frederik Fajeau, Mathis Da Costa, Lancelot Sharma, Mridul Amayuelas, Alfonso Xiao, Tim Z. Hyland, David Hennig, Philipp Schölkopf, Bernhard
author_facet	Six, Valentin Panse, Frederik Fajeau, Mathis Da Costa, Lancelot Sharma, Mridul Amayuelas, Alfonso Xiao, Tim Z. Hyland, David Hennig, Philipp Schölkopf, Bernhard
contents	Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically requires extensive environment interaction. We ask whether language-model priors can reduce costly interaction by leveraging prior knowledge, and introduce \emph{Pinductor} (POMDP-inductor): an LLM proposes candidate POMDP models from a few observation-action trajectories and iteratively refines them to optimize a belief-based likelihood score. Despite using strictly less information, \emph{Pinductor} matches the performance and sample efficiency of LLM-based POMDP learning methods that assume privileged access to the hidden state, while significantly surpassing the sample efficiency of tabular POMDP baselines. Further results show that performance scales with LLM capability and degrades gracefully as semantic information about the environment is withheld. Together, these results position language-model priors as a practical tool for sample-efficient world-model learning under partial observability, and a step toward generalist agents in real-world environments. Code is available at https://github.com/atomresearch/pinductor.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_13740
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Learning POMDP World Models from Observations with Language-Model Priors Six, Valentin Panse, Frederik Fajeau, Mathis Da Costa, Lancelot Sharma, Mridul Amayuelas, Alfonso Xiao, Tim Z. Hyland, David Hennig, Philipp Schölkopf, Bernhard Machine Learning Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically requires extensive environment interaction. We ask whether language-model priors can reduce costly interaction by leveraging prior knowledge, and introduce \emph{Pinductor} (POMDP-inductor): an LLM proposes candidate POMDP models from a few observation-action trajectories and iteratively refines them to optimize a belief-based likelihood score. Despite using strictly less information, \emph{Pinductor} matches the performance and sample efficiency of LLM-based POMDP learning methods that assume privileged access to the hidden state, while significantly surpassing the sample efficiency of tabular POMDP baselines. Further results show that performance scales with LLM capability and degrades gracefully as semantic information about the environment is withheld. Together, these results position language-model priors as a practical tool for sample-efficient world-model learning under partial observability, and a step toward generalist agents in real-world environments. Code is available at https://github.com/atomresearch/pinductor.
title	Learning POMDP World Models from Observations with Language-Model Priors
topic	Machine Learning
url	https://arxiv.org/abs/2605.13740

Similar Items