Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nidadala, Uday Kumar, Guthi, Venkata Bhumika
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2601.00831
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908744619655168
author	Nidadala, Uday Kumar Guthi, Venkata Bhumika
author_facet	Nidadala, Uday Kumar Guthi, Venkata Bhumika
contents	Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through truncated rollouts, windowed training, or hierarchical decomposition (Levine et al., 2020; Prudencio et al., 2023; Park et al., 2025). Despite recent empirical evidence that horizon reduction can improve scaling on challenging offline RL benchmarks, its theoretical implications remain underdeveloped (Park et al., 2025). In this paper, we show that horizon reduction can induce fundamental and irrecoverable information loss in offline RL. We formalize horizon reduction as learning from fixed-length trajectory segments and prove that, under this paradigm and any learning interface restricted to fixed-length trajectory segments, optimal policies may be statistically indistinguishable from suboptimal ones even with infinite data and perfect function approximation. Through a set of minimal counterexample Markov decision processes (MDPs), we identify three distinct structural failure modes: (i) prefix indistinguishability leading to identifiability failure, (ii) objective misspecification induced by truncated returns, and (iii) offline dataset support and representation aliasing. Our results establish necessary conditions under which horizon reduction can be safe and highlight intrinsic limitations that cannot be overcome by algorithmic improvements alone, complementing algorithmic work on conservative objectives and distribution shift that addresses a different axis of offline RL difficulty (Fujimoto et al., 2019; Kumar et al., 2020; Gulcehre et al., 2020).
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00831
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Horizon Reduction as Information Loss in Offline Reinforcement Learning Nidadala, Uday Kumar Guthi, Venkata Bhumika Machine Learning Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through truncated rollouts, windowed training, or hierarchical decomposition (Levine et al., 2020; Prudencio et al., 2023; Park et al., 2025). Despite recent empirical evidence that horizon reduction can improve scaling on challenging offline RL benchmarks, its theoretical implications remain underdeveloped (Park et al., 2025). In this paper, we show that horizon reduction can induce fundamental and irrecoverable information loss in offline RL. We formalize horizon reduction as learning from fixed-length trajectory segments and prove that, under this paradigm and any learning interface restricted to fixed-length trajectory segments, optimal policies may be statistically indistinguishable from suboptimal ones even with infinite data and perfect function approximation. Through a set of minimal counterexample Markov decision processes (MDPs), we identify three distinct structural failure modes: (i) prefix indistinguishability leading to identifiability failure, (ii) objective misspecification induced by truncated returns, and (iii) offline dataset support and representation aliasing. Our results establish necessary conditions under which horizon reduction can be safe and highlight intrinsic limitations that cannot be overcome by algorithmic improvements alone, complementing algorithmic work on conservative objectives and distribution shift that addresses a different axis of offline RL difficulty (Fujimoto et al., 2019; Kumar et al., 2020; Gulcehre et al., 2020).
title	Horizon Reduction as Information Loss in Offline Reinforcement Learning
topic	Machine Learning
url	https://arxiv.org/abs/2601.00831

Ähnliche Einträge