Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Wang, Qi
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2401.12882
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914649484558336
author	Wang, Qi
author_facet	Wang, Qi
contents	This paper presents a δ-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The δ-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning δ-PI reinforcement learning methods are provided, respectively. Off-policy version δ-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy δ-PI algorithms is shown. The suitability of the model-free δ-PI algorithm is illustrated with a nonlinear system simulation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_12882
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control Wang, Qi Machine Learning This paper presents a δ-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The δ-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning δ-PI reinforcement learning methods are provided, respectively. Off-policy version δ-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy δ-PI algorithms is shown. The suitability of the model-free δ-PI algorithm is illustrated with a nonlinear system simulation.
title	Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control
topic	Machine Learning
url	https://arxiv.org/abs/2401.12882

Similar Items