Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Junca, Mauricio, Leiva, Esteban
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Optimization and Control
Online Access:	https://arxiv.org/abs/2505.21639
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918358688989184
author	Junca, Mauricio Leiva, Esteban
author_facet	Junca, Mauricio Leiva, Esteban
contents	The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) has been relatively underexplored in the literature, despite addressing the same problem. In this work, we revisit the relationship between the IO framework for MDPs, IRL, and apprenticeship learning (AL). We incorporate prior beliefs on the structure of the cost function into the IRL and AL problems, and demonstrate that the convex-analytic view of the AL formalism emerges as a relaxation of our framework. Notably, the AL formalism is a special case in our framework when the regularization term is absent. Focusing on the suboptimal expert setting, we formulate the AL problem as a regularized min-max problem. The regularizer plays a key role in addressing the ill-posedness of IRL by guiding the search for plausible cost functions. To solve the resulting regularized-convex-concave-min-max problem, we use stochastic mirror descent (SMD) and establish convergence bounds for the proposed method. Numerical experiments highlight the critical role of regularization in learning cost vectors and apprentice policies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_21639
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Apprenticeship learning with prior beliefs using inverse optimization Junca, Mauricio Leiva, Esteban Machine Learning Optimization and Control The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) has been relatively underexplored in the literature, despite addressing the same problem. In this work, we revisit the relationship between the IO framework for MDPs, IRL, and apprenticeship learning (AL). We incorporate prior beliefs on the structure of the cost function into the IRL and AL problems, and demonstrate that the convex-analytic view of the AL formalism emerges as a relaxation of our framework. Notably, the AL formalism is a special case in our framework when the regularization term is absent. Focusing on the suboptimal expert setting, we formulate the AL problem as a regularized min-max problem. The regularizer plays a key role in addressing the ill-posedness of IRL by guiding the search for plausible cost functions. To solve the resulting regularized-convex-concave-min-max problem, we use stochastic mirror descent (SMD) and establish convergence bounds for the proposed method. Numerical experiments highlight the critical role of regularization in learning cost vectors and apprentice policies.
title	Apprenticeship learning with prior beliefs using inverse optimization
topic	Machine Learning Optimization and Control
url	https://arxiv.org/abs/2505.21639

Similar Items