Taula de continguts: :: Library Catalog

Guardat en:

Dades bibliogràfiques
Autors principals:	Maran, Davide, Metelli, Alberto Maria, Papini, Matteo, Restelli, Marcello
Format:	Preprint
Publicat:	2024
Matèries:	Machine Learning Artificial Intelligence
Accés en línia:	https://arxiv.org/abs/2405.06363
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Taula de continguts:

We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our~$\widetilde{\mathcal{O}}(ε^{-2-d/(ν+1)})$ sample complexity, where $d$ is the dimension of the state-action space and $ν$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs $(ν=0)$. At the same time, for $ν\to\infty$, it recovers and greatly generalizes the $\mathcal{O}(ε^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

Ítems similars