Enregistré dans:
| Auteurs principaux: | , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2402.09900 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866916456410644480 |
|---|---|
| author | Morad, Steven Lu, Chris Kortvelesy, Ryan Liwicki, Stephan Foerster, Jakob Prorok, Amanda |
| author_facet | Morad, Steven Lu, Chris Kortvelesy, Ryan Liwicki, Stephan Foerster, Jakob Prorok, Amanda |
| contents | Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models. We discover that the recurrent update of these models resembles a monoid, leading us to reformulate existing models using a novel monoid-based framework that we call memoroids. We revisit the traditional approach to batching in recurrent reinforcement learning, highlighting theoretical and empirical deficiencies. We leverage memoroids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in reinforcement learning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_09900 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Recurrent Reinforcement Learning with Memoroids Morad, Steven Lu, Chris Kortvelesy, Ryan Liwicki, Stephan Foerster, Jakob Prorok, Amanda Machine Learning Artificial Intelligence Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models. We discover that the recurrent update of these models resembles a monoid, leading us to reformulate existing models using a novel monoid-based framework that we call memoroids. We revisit the traditional approach to batching in recurrent reinforcement learning, highlighting theoretical and empirical deficiencies. We leverage memoroids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in reinforcement learning. |
| title | Recurrent Reinforcement Learning with Memoroids |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2402.09900 |