Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2011.11517 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909611017109504 |
|---|---|
| author | Malloy, Tailia Klinger, Tim Liu, Miao Riemer, Matthew Tesauro, Gerald Sims, Chris R. |
| author_facet | Malloy, Tailia Klinger, Tim Liu, Miao Riemer, Matthew Tesauro, Gerald Sims, Chris R. |
| contents | This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2011_11517 |
| institution | arXiv |
| publishDate | 2020 |
| record_format | arxiv |
| spellingShingle | Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games Malloy, Tailia Klinger, Tim Liu, Miao Riemer, Matthew Tesauro, Gerald Sims, Chris R. Artificial Intelligence This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Previous research with a related approach in continuous control experiments suggests that this method favors learning policies that are more robust to changing environment dynamics. The multi-agent game setting naturally requires this type of robustness, as other agents' policies change throughout learning, introducing a nonstationary environment. For this reason, recent methods in continual learning are compared to our approach, termed Capacity-Limited MADDPG. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments. |
| title | Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2011.11517 |