Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Roch, Zachary, Zhang, Chi, Atia, George, Wang, Yue
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.12462
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Robust reinforcement learning (RL) under the average-reward criterion is essential for long-term decision-making, particularly when the environment may differ from its specification. However, a significant gap exists in understanding the finite-sample complexity of these methods, as most existing work provides only asymptotic guarantees. This limitation hinders their principled understanding and practical deployment, especially in data-limited scenarios. We close this gap by proposing \textbf{Robust Halpern Iteration (RHI)}, a new algorithm designed for robust Markov Decision Processes (MDPs) with transition uncertainty characterized by $\ell_p$-norm and contamination models. Our approach offers three key advantages over previous methods: (1). Weaker Structural Assumptions: RHI only requires the underlying robust MDP to be communicating, a less restrictive condition than the commonly assumed ergodicity or irreducibility; (2). No Prior Knowledge: Our algorithm operates without requiring any prior knowledge of the robust MDP; (3). State-of-the-Art Sample Complexity: To learn an $ε$-optimal robust policy, RHI achieves a sample complexity of $\tilde{\mathcal O}\left(\frac{SA\mathcal H^{2}}{ε^{2}}\right)$, where $S$ and $A$ denote the numbers of states and actions, and $\mathcal H$ is the robust optimal bias span. This result represents the tightest known bound. Our work hence provides essential theoretical understanding of sample efficiency of robust average reward RL.

Similar Items