Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Qixin, Cao, Hao, Hu, Jian-Qiang, Hu, Mingjie, Xia, Li
Format:	Preprint
Published:	2026
Subjects:	Optimization and Control
Online Access:	https://arxiv.org/abs/2603.09734
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911503252193280
author	Wang, Qixin Cao, Hao Hu, Jian-Qiang Hu, Mingjie Xia, Li
author_facet	Wang, Qixin Cao, Hao Hu, Jian-Qiang Hu, Mingjie Xia, Li
contents	Conditional value-at-risk (CVaR) is a prominent risk measure in financial engineering, energy systems, and supply chain management. In these domains, Markov decision processes (MDPs) with a long-run CVaR criterion effectively mitigate cost variability over a specified horizon. However, implementing MDPs relies on known transition models, which are typically unavailable in practice. This necessitates a model-free approach to risk-sensitive dynamic optimization. To tackle this challenge, we propose a reinforcement learning algorithm that simultaneously conducts policy evaluation and improvement based on a CVaR-specific Bellman local optimality equation. This algorithm employs a nonparametric incremental learning approach for policy improvement, relying on a single sample trajectory to identify the optimal policy. Under appropriate technical conditions, we prove almost sure convergence of the algorithm and derive its convergence rate. Our analysis reveals that the optimal convergence rate, measured by the mean absolute error of policy estimators, is of order O(1/n). Our main algorithm and results are further extended to solving the mean-CVaR optimization problem. Numerical experiments corroborate these results.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_09734
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Long-Run Conditional Value-at-Risk Reinforcement Learning Wang, Qixin Cao, Hao Hu, Jian-Qiang Hu, Mingjie Xia, Li Optimization and Control Conditional value-at-risk (CVaR) is a prominent risk measure in financial engineering, energy systems, and supply chain management. In these domains, Markov decision processes (MDPs) with a long-run CVaR criterion effectively mitigate cost variability over a specified horizon. However, implementing MDPs relies on known transition models, which are typically unavailable in practice. This necessitates a model-free approach to risk-sensitive dynamic optimization. To tackle this challenge, we propose a reinforcement learning algorithm that simultaneously conducts policy evaluation and improvement based on a CVaR-specific Bellman local optimality equation. This algorithm employs a nonparametric incremental learning approach for policy improvement, relying on a single sample trajectory to identify the optimal policy. Under appropriate technical conditions, we prove almost sure convergence of the algorithm and derive its convergence rate. Our analysis reveals that the optimal convergence rate, measured by the mean absolute error of policy estimators, is of order O(1/n). Our main algorithm and results are further extended to solving the mean-CVaR optimization problem. Numerical experiments corroborate these results.
title	Long-Run Conditional Value-at-Risk Reinforcement Learning
topic	Optimization and Control
url	https://arxiv.org/abs/2603.09734

Similar Items