Saved in:
Bibliographic Details
Main Authors: Kim, Jihun, Lavaei, Javad
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.03230
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914964569063424
author Kim, Jihun
Lavaei, Javad
author_facet Kim, Jihun
Lavaei, Javad
contents This paper is concerned with the online bandit nonlinear control, which aims to learn the best stabilizing controller from a pool of stabilizing and destabilizing controllers of unknown types for a given nonlinear dynamical system. We develop an algorithm, named Dynamic Batch length and Adaptive learning Rate (DBAR), and study its stability and regret. Unlike the existing Exp3 algorithm requiring an exponentially stabilizing controller, DBAR only needs a significantly weaker notion of controller stability, in which case substantial time may be required to certify the system stability. Dynamic batch length in DBAR effectively addresses this issue and enables the system to attain asymptotic stability, where the algorithm behaves as if there were no destabilizing controllers. Moreover, adaptive learning rate in DBAR only uses the state norm information to achieve a tight regret bound even when none of the stabilizing controllers in the pool are exponentially stabilizing.
format Preprint
id arxiv_https___arxiv_org_abs_2410_03230
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Online Bandit Nonlinear Control with Dynamic Batch Length and Adaptive Learning Rate
Kim, Jihun
Lavaei, Javad
Systems and Control
68W27, 93C10
This paper is concerned with the online bandit nonlinear control, which aims to learn the best stabilizing controller from a pool of stabilizing and destabilizing controllers of unknown types for a given nonlinear dynamical system. We develop an algorithm, named Dynamic Batch length and Adaptive learning Rate (DBAR), and study its stability and regret. Unlike the existing Exp3 algorithm requiring an exponentially stabilizing controller, DBAR only needs a significantly weaker notion of controller stability, in which case substantial time may be required to certify the system stability. Dynamic batch length in DBAR effectively addresses this issue and enables the system to attain asymptotic stability, where the algorithm behaves as if there were no destabilizing controllers. Moreover, adaptive learning rate in DBAR only uses the state norm information to achieve a tight regret bound even when none of the stabilizing controllers in the pool are exponentially stabilizing.
title Online Bandit Nonlinear Control with Dynamic Batch Length and Adaptive Learning Rate
topic Systems and Control
68W27, 93C10
url https://arxiv.org/abs/2410.03230