Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Harwood, Alfred, Faustino, Jose, Altair, Alex
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.12963
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914328379129856
author	Harwood, Alfred Faustino, Jose Altair, Alex
author_facet	Harwood, Alfred Faustino, Jose Altair, Alex
contents	An important question in the field of AI is the extent to which successful behaviour requires an internal representation of the world. In this work, we quantify the amount of information an optimal policy provides about the underlying environment. We consider a Controlled Markov Process (CMP) with $n$ states and $m$ actions, assuming a uniform prior over the space of possible transition dynamics. We prove that observing a deterministic policy that is optimal for any non-constant reward function then conveys exactly $n \log m$ bits of information about the environment. Specifically, we show that the mutual information between the environment and the optimal policy is $n \log m$ bits. This bound holds across a broad class of objectives, including finite-horizon, infinite-horizon discounted, and time-averaged reward maximization. These findings provide a precise information-theoretic lower bound on the "implicit world model'' necessary for optimality.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_12963
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Information-theoretic analysis of world models in optimal reward maximizers Harwood, Alfred Faustino, Jose Altair, Alex Artificial Intelligence An important question in the field of AI is the extent to which successful behaviour requires an internal representation of the world. In this work, we quantify the amount of information an optimal policy provides about the underlying environment. We consider a Controlled Markov Process (CMP) with $n$ states and $m$ actions, assuming a uniform prior over the space of possible transition dynamics. We prove that observing a deterministic policy that is optimal for any non-constant reward function then conveys exactly $n \log m$ bits of information about the environment. Specifically, we show that the mutual information between the environment and the optimal policy is $n \log m$ bits. This bound holds across a broad class of objectives, including finite-horizon, infinite-horizon discounted, and time-averaged reward maximization. These findings provide a precise information-theoretic lower bound on the "implicit world model'' necessary for optimality.
title	Information-theoretic analysis of world models in optimal reward maximizers
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.12963

Similar Items