Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Naggita, Keziah, Walter, Matthew R., Blum, Avrim
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2404.17034
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913871404466176
author	Naggita, Keziah Walter, Matthew R. Blum, Avrim
author_facet	Naggita, Keziah Walter, Matthew R. Blum, Avrim
contents	Recourse generators provide actionable insights, often through feature-based counterfactual explanations (CFEs), to help negatively classified individuals understand how to adjust their input features to achieve a positive classification. These feature-based CFEs, which we refer to as \emph{low-level} CFEs, are overly specific (e.g., coding experience: \(4 \to 5+\) years) and often recommended in a feature space that doesn't straightforwardly align with real-world actions. To bridge this gap, we introduce three novel recourse types grounded in real-world actions: high-level continuous (\emph{hl-continuous}), high-level discrete (\emph{hl-discrete}), and high-level ID (\emph{hl-id}) CFEs. We formulate single-agent CFE generation methods, where we model the hl-discrete CFE as a solution to a weighted set cover problem and the hl-continuous CFE as a solution to an integer linear program. Since these methods require costly optimization per agent, we propose data-driven CFE generation approaches that, given instances of agents and their optimal CFEs, learn a CFE generator that quickly provides optimal CFEs for new agents. This approach, also viewed as one of learning an optimal policy in a family of large but deterministic MDPs, considers several problem formulations, including formulations in which the actions and their effects are unknown, and therefore addresses informational and computational challenges. We conduct extensive empirical evaluations using healthcare datasets (BRFSS, Foods, and NHANES) and fully-synthetic data. For negatively classified agents identified by linear or threshold-based classifiers, we compare the high-level CFE to low-level CFEs and assess the effectiveness of our network-based, data-driven approaches. Results show that the data-driven CFE generators are accurate, and resource-efficient, and high-level CFEs offer key advantages over low-level CFEs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_17034
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Learning Actionable Counterfactual Explanations in Large State Spaces Naggita, Keziah Walter, Matthew R. Blum, Avrim Machine Learning Recourse generators provide actionable insights, often through feature-based counterfactual explanations (CFEs), to help negatively classified individuals understand how to adjust their input features to achieve a positive classification. These feature-based CFEs, which we refer to as \emph{low-level} CFEs, are overly specific (e.g., coding experience: \(4 \to 5+\) years) and often recommended in a feature space that doesn't straightforwardly align with real-world actions. To bridge this gap, we introduce three novel recourse types grounded in real-world actions: high-level continuous (\emph{hl-continuous}), high-level discrete (\emph{hl-discrete}), and high-level ID (\emph{hl-id}) CFEs. We formulate single-agent CFE generation methods, where we model the hl-discrete CFE as a solution to a weighted set cover problem and the hl-continuous CFE as a solution to an integer linear program. Since these methods require costly optimization per agent, we propose data-driven CFE generation approaches that, given instances of agents and their optimal CFEs, learn a CFE generator that quickly provides optimal CFEs for new agents. This approach, also viewed as one of learning an optimal policy in a family of large but deterministic MDPs, considers several problem formulations, including formulations in which the actions and their effects are unknown, and therefore addresses informational and computational challenges. We conduct extensive empirical evaluations using healthcare datasets (BRFSS, Foods, and NHANES) and fully-synthetic data. For negatively classified agents identified by linear or threshold-based classifiers, we compare the high-level CFE to low-level CFEs and assess the effectiveness of our network-based, data-driven approaches. Results show that the data-driven CFE generators are accurate, and resource-efficient, and high-level CFEs offer key advantages over low-level CFEs.
title	Learning Actionable Counterfactual Explanations in Large State Spaces
topic	Machine Learning
url	https://arxiv.org/abs/2404.17034

Similar Items