Salvato in:
| Autori principali: | , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2024
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2407.15820 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866910833540333568 |
|---|---|
| author | Lefebvre, Randy Durand, Audrey |
| author_facet | Lefebvre, Randy Durand, Audrey |
| contents | Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2407_15820 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | On shallow planning under partial observability Lefebvre, Randy Durand, Audrey Artificial Intelligence Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability. |
| title | On shallow planning under partial observability |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2407.15820 |