Guardado en:
| Autores principales: | , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2512.11082 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866908707062808576 |
|---|---|
| author | van Zuijlen, Roy Antunes, Duarte |
| author_facet | van Zuijlen, Roy Antunes, Duarte |
| contents | Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather than in the high-dimensional belief space. However, extending classical methods such as policy iteration to this setting remains difficult; the output process is non-Markovian, making policy-improvement steps interdependent across stages. We introduce a new family of monotonically improving policy-iteration algorithms that alternate between single-stage output-based policy improvements and policy evaluations according to a prescribed periodic pattern. We show that this family admits optimal patterns that maximize a natural computational-efficiency index, and we identify the simplest pattern with minimal period. Building on this structure, we further develop a model-free variant that estimates values from data and learns memoryless policies directly. Across several POMDPs examples, our method achieves significant computational speedups over policy-gradient baselines and recent specialized algorithms in both model-based and model-free settings. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2512_11082 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Memoryless Policy Iteration for Episodic POMDPs van Zuijlen, Roy Antunes, Duarte Machine Learning Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather than in the high-dimensional belief space. However, extending classical methods such as policy iteration to this setting remains difficult; the output process is non-Markovian, making policy-improvement steps interdependent across stages. We introduce a new family of monotonically improving policy-iteration algorithms that alternate between single-stage output-based policy improvements and policy evaluations according to a prescribed periodic pattern. We show that this family admits optimal patterns that maximize a natural computational-efficiency index, and we identify the simplest pattern with minimal period. Building on this structure, we further develop a model-free variant that estimates values from data and learns memoryless policies directly. Across several POMDPs examples, our method achieves significant computational speedups over policy-gradient baselines and recent specialized algorithms in both model-based and model-free settings. |
| title | Memoryless Policy Iteration for Episodic POMDPs |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2512.11082 |