Guardado en:
| Autores principales: | , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2603.06145 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866911544976080896 |
|---|---|
| author | Huang, Yu-Jui Yu, Xiang Zhang, Keyu |
| author_facet | Huang, Yu-Jui Yu, Xiang Zhang, Keyu |
| contents | For a general entropy-regularized time-inconsistent stochastic control problem, we propose a policy iteration algorithm (PIA) and establish its convergence to an equilibrium policy with an exponential convergence rate. The design of the PIA is based on a coupled system of non-local partial differential equations, called the exploratory equilibrium Hamilton--Jacobi--Bellman (EEHJB) equation. As opposed to the standard time-consistent case, policy improvement fails in general and the target value function (now an equilibrium value function) is not even known to exist a priori. To overcome these, we prove that the value functions generated by the PIA form a Cauchy sequence in a specialized Banach space, hence admit a limit, and the rate of convergence is exponential, on the strength of the Bismut--Elworthy--Li formula of stochastic representation. The limiting value function is shown to fulfill the EEHJB equation, which induces an equilibrium policy in a Gibbs form. Such convergence in value additionally implies uniform convergence of the generated policies to the equilibrium policy, again with an exponential rate. As a byproduct, the PIA gives a constructive proof of the global existence and uniqueness of a classical solution to our general EEHJB equation, whose well-posedness has not been explored in the literature. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_06145 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Policy Iteration Achieves Regularized Equilibrium under Time Inconsistency Huang, Yu-Jui Yu, Xiang Zhang, Keyu Optimization and Control For a general entropy-regularized time-inconsistent stochastic control problem, we propose a policy iteration algorithm (PIA) and establish its convergence to an equilibrium policy with an exponential convergence rate. The design of the PIA is based on a coupled system of non-local partial differential equations, called the exploratory equilibrium Hamilton--Jacobi--Bellman (EEHJB) equation. As opposed to the standard time-consistent case, policy improvement fails in general and the target value function (now an equilibrium value function) is not even known to exist a priori. To overcome these, we prove that the value functions generated by the PIA form a Cauchy sequence in a specialized Banach space, hence admit a limit, and the rate of convergence is exponential, on the strength of the Bismut--Elworthy--Li formula of stochastic representation. The limiting value function is shown to fulfill the EEHJB equation, which induces an equilibrium policy in a Gibbs form. Such convergence in value additionally implies uniform convergence of the generated policies to the equilibrium policy, again with an exponential rate. As a byproduct, the PIA gives a constructive proof of the global existence and uniqueness of a classical solution to our general EEHJB equation, whose well-posedness has not been explored in the literature. |
| title | Policy Iteration Achieves Regularized Equilibrium under Time Inconsistency |
| topic | Optimization and Control |
| url | https://arxiv.org/abs/2603.06145 |