Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Yang, Aggarwal, Vaneet
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning Numerical Analysis
Online-Zugang:	https://arxiv.org/abs/2602.00474
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866914540309970944
author	Xu, Yang Aggarwal, Vaneet
author_facet	Xu, Yang Aggarwal, Vaneet
contents	We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Cesàro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $\mathcal{K}(P)$ is the minimal exact quotient that removes all non-decaying modes and makes the remaining dynamics strictly stable. After choosing a gauge projection $Π$ with kernel $\mathcal{K}(P)$, the reward admits a unique decomposition $r = g_Π^\star + (I-P)v_Π^\star$, where $g_Π^\star$ is a persistent regime profile and $v_Π^\star$ is a gauge-fixed transient component. An exact comparison with classical normalized gain and bias shows that the new pair reallocates the same information so that all persistent modes are represented in $g_Π^\star$ and $v_Π^\star$ is transient. This decomposition reconstructs finite-horizon returns, recovers statewise average reward, admits a transient-cost interpretation, and yields a stable estimator under a generative model.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_00474
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients Xu, Yang Aggarwal, Vaneet Machine Learning Numerical Analysis We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Cesàro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $\mathcal{K}(P)$ is the minimal exact quotient that removes all non-decaying modes and makes the remaining dynamics strictly stable. After choosing a gauge projection $Π$ with kernel $\mathcal{K}(P)$, the reward admits a unique decomposition $r = g_Π^\star + (I-P)v_Π^\star$, where $g_Π^\star$ is a persistent regime profile and $v_Π^\star$ is a gauge-fixed transient component. An exact comparison with classical normalized gain and bias shows that the new pair reallocates the same information so that all persistent modes are represented in $g_Π^\star$ and $v_Π^\star$ is transient. This decomposition reconstructs finite-horizon returns, recovers statewise average reward, admits a transient-cost interpretation, and yields a stable estimator under a generative model.
title	Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients
topic	Machine Learning Numerical Analysis
url	https://arxiv.org/abs/2602.00474

Ähnliche Einträge