Saved in:
Bibliographic Details
Main Authors: Fuentes-Vicente, Laura, Even, Mathieu, Dormion, Gaelle, Josse, Julie, Chambaz, Antoine
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.22717
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915763465486336
author Fuentes-Vicente, Laura
Even, Mathieu
Dormion, Gaelle
Josse, Julie
Chambaz, Antoine
author_facet Fuentes-Vicente, Laura
Even, Mathieu
Dormion, Gaelle
Josse, Julie
Chambaz, Antoine
contents A medical policy aims to support decision-making by mapping patient characteristics to individualized treatment recommendations. Standard approaches typically optimize a single outcome criterion. For example, recommending treatment according to the sign of the Conditional Average Treatment Effect (CATE) maximizes the policy "value" by exploiting treatment effect heterogeneity. This point of view shifts policy learning towards the challenge of learning a reliable CATE estimator. However, in multi-outcome settings, such strategies ignore the risk of adverse events, despite their relevance. PLUC (Policy Learning Under Constraint) addresses this challenges by learning an estimator of the CATE that yields smoothed policies controlling the probability of an adverse event in observational settings. Inspired by insights from EP-learning, PLUC involves the optimization of strongly convex Lagrangian criteria over a convex hull of functions. Its alternating procedure iteratively applies the Frank-Wolfe algorithm to minimize the current criterion, then performs a targeting step that updates the criterion so that its evaluations at previously visited landmarks become targeted estimators of the corresponding theoretical quantities. An R package PLUC-R provides a practical implementation. We illustrate PLUC's performance through a series of numerical experiments.
format Preprint
id arxiv_https___arxiv_org_abs_2601_22717
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event
Fuentes-Vicente, Laura
Even, Mathieu
Dormion, Gaelle
Josse, Julie
Chambaz, Antoine
Methodology
A medical policy aims to support decision-making by mapping patient characteristics to individualized treatment recommendations. Standard approaches typically optimize a single outcome criterion. For example, recommending treatment according to the sign of the Conditional Average Treatment Effect (CATE) maximizes the policy "value" by exploiting treatment effect heterogeneity. This point of view shifts policy learning towards the challenge of learning a reliable CATE estimator. However, in multi-outcome settings, such strategies ignore the risk of adverse events, despite their relevance. PLUC (Policy Learning Under Constraint) addresses this challenges by learning an estimator of the CATE that yields smoothed policies controlling the probability of an adverse event in observational settings. Inspired by insights from EP-learning, PLUC involves the optimization of strongly convex Lagrangian criteria over a convex hull of functions. Its alternating procedure iteratively applies the Frank-Wolfe algorithm to minimize the current criterion, then performs a targeting step that updates the criterion so that its evaluations at previously visited landmarks become targeted estimators of the corresponding theoretical quantities. An R package PLUC-R provides a practical implementation. We illustrate PLUC's performance through a series of numerical experiments.
title Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event
topic Methodology
url https://arxiv.org/abs/2601.22717