Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Yu, Xinqiang, Yang, Chuanguang, Yu, Chengqing, Huang, Libo, An, Zhulin, Xu, Yongjun
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2406.05488
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866916280171233280
author	Yu, Xinqiang Yang, Chuanguang Yu, Chengqing Huang, Libo An, Zhulin Xu, Yongjun
author_facet	Yu, Xinqiang Yang, Chuanguang Yu, Chengqing Huang, Libo An, Zhulin Xu, Yongjun
contents	Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group knowledge to each student policy. However, naive aggregation functions tend to cause student policies quickly homogenize. To address the challenge, we introduce the Decision-Attention module to the online policies distillation framework. The Decision-Attention module can generate a distinct set of weights for each policy to measure the importance of group members. We use the Atari platform for experiments with various reinforcement learning algorithms, including PPO and DQN. In different tasks, our method can perform better than an independent training policy on both PPO and DQN algorithms. This suggests that our OPD-DA can transfer knowledge between different policies well and help agents obtain more rewards.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_05488
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Online Policy Distillation with Decision-Attention Yu, Xinqiang Yang, Chuanguang Yu, Chengqing Huang, Libo An, Zhulin Xu, Yongjun Machine Learning Artificial Intelligence Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group knowledge to each student policy. However, naive aggregation functions tend to cause student policies quickly homogenize. To address the challenge, we introduce the Decision-Attention module to the online policies distillation framework. The Decision-Attention module can generate a distinct set of weights for each policy to measure the importance of group members. We use the Atari platform for experiments with various reinforcement learning algorithms, including PPO and DQN. In different tasks, our method can perform better than an independent training policy on both PPO and DQN algorithms. This suggests that our OPD-DA can transfer knowledge between different policies well and help agents obtain more rewards.
title	Online Policy Distillation with Decision-Attention
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2406.05488

Ejemplares similares