Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sani, Numair, Malinsky, Daniel, Shpitser, Ilya
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2006.02482
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912528169172992
author	Sani, Numair Malinsky, Daniel Shpitser, Ilya
author_facet	Sani, Numair Malinsky, Daniel Shpitser, Ilya
contents	Causal approaches to post-hoc explainability for black-box prediction models (e.g., deep neural networks trained on image pixel data) have become increasingly popular. However, existing approaches have two important shortcomings: (i) the "explanatory units" are micro-level inputs into the relevant prediction model, e.g., image pixels, rather than interpretable macro-level features that are more useful for understanding how to possibly change the algorithm's behavior, and (ii) existing approaches assume there exists no unmeasured confounding between features and target model predictions, which fails to hold when the explanatory units are macro-level variables. Our focus is on the important setting where the analyst has no access to the inner workings of the target prediction algorithm, rather only the ability to query the output of the model in response to a particular input. To provide causal explanations in such a setting, we propose to learn causal graphical representations that allow for arbitrary unmeasured confounding among features. We demonstrate the resulting graph can differentiate between interpretable features that causally influence model predictions versus those that are merely associated with model predictions due to confounding. Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
format	Preprint
id	arxiv_https___arxiv_org_abs_2006_02482
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning Sani, Numair Malinsky, Daniel Shpitser, Ilya Machine Learning Artificial Intelligence Causal approaches to post-hoc explainability for black-box prediction models (e.g., deep neural networks trained on image pixel data) have become increasingly popular. However, existing approaches have two important shortcomings: (i) the "explanatory units" are micro-level inputs into the relevant prediction model, e.g., image pixels, rather than interpretable macro-level features that are more useful for understanding how to possibly change the algorithm's behavior, and (ii) existing approaches assume there exists no unmeasured confounding between features and target model predictions, which fails to hold when the explanatory units are macro-level variables. Our focus is on the important setting where the analyst has no access to the inner workings of the target prediction algorithm, rather only the ability to query the output of the model in response to a particular input. To provide causal explanations in such a setting, we propose to learn causal graphical representations that allow for arbitrary unmeasured confounding among features. We demonstrate the resulting graph can differentiate between interpretable features that causally influence model predictions versus those that are merely associated with model predictions due to confounding. Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
title	Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2006.02482

Similar Items