Saved in:
Bibliographic Details
Main Authors: Luo, Tianjiao, Pearce, Tim, Chen, Huayu, Chen, Jianfei, Zhu, Jun
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.16349
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916458448027648
author Luo, Tianjiao
Pearce, Tim
Chen, Huayu
Chen, Jianfei
Zhu, Jun
author_facet Luo, Tianjiao
Pearce, Tim
Chen, Huayu
Chen, Jianfei
Zhu, Jun
contents Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.
format Preprint
id arxiv_https___arxiv_org_abs_2402_16349
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
Luo, Tianjiao
Pearce, Tim
Chen, Huayu
Chen, Jianfei
Zhu, Jun
Machine Learning
Systems and Control
Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.
title C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
topic Machine Learning
Systems and Control
url https://arxiv.org/abs/2402.16349