Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Tianjiao, Pearce, Tim, Chen, Huayu, Chen, Jianfei, Zhu, Jun
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Systems and Control
Online Access:	https://arxiv.org/abs/2402.16349
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916458448027648
author	Luo, Tianjiao Pearce, Tim Chen, Huayu Chen, Jianfei Zhu, Jun
author_facet	Luo, Tianjiao Pearce, Tim Chen, Huayu Chen, Jianfei Zhu, Jun
contents	Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_16349
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory Luo, Tianjiao Pearce, Tim Chen, Huayu Chen, Jianfei Zhu, Jun Machine Learning Systems and Control Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.
title	C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
topic	Machine Learning Systems and Control
url	https://arxiv.org/abs/2402.16349

Similar Items