Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	de Marchi, Daniel, Kosorok, Michael, de Marchi, Scott
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2408.08845
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916359849377792
author	de Marchi, Daniel Kosorok, Michael de Marchi, Scott
author_facet	de Marchi, Daniel Kosorok, Michael de Marchi, Scott
contents	Shapley values have seen widespread use in machine learning as a way to explain model predictions and estimate the importance of covariates. Accurately explaining models is critical in real-world models to both aid in decision making and to infer the properties of the true data-generating process (DGP). In this paper, we demonstrate that while model-based Shapley values might be accurate explainers of model predictions, machine learning models themselves are often poor explainers of the DGP even if the model is highly accurate. Particularly in the presence of interrelated or noisy variables, the output of a highly predictive model may fail to account for these relationships. This implies explanations of a trained model's behavior may fail to provide meaningful insight into the DGP. In this paper we introduce a novel variable importance algorithm, Shapley Marginal Surplus for Strong Models, that samples the space of possible models to come up with an inferential measure of feature importance. We compare this method to other popular feature importance methods, both Shapley-based and non-Shapley based, and demonstrate significant outperformance in inferential capabilities relative to other methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_08845
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Shapley Marginal Surplus for Strong Models de Marchi, Daniel Kosorok, Michael de Marchi, Scott Machine Learning Shapley values have seen widespread use in machine learning as a way to explain model predictions and estimate the importance of covariates. Accurately explaining models is critical in real-world models to both aid in decision making and to infer the properties of the true data-generating process (DGP). In this paper, we demonstrate that while model-based Shapley values might be accurate explainers of model predictions, machine learning models themselves are often poor explainers of the DGP even if the model is highly accurate. Particularly in the presence of interrelated or noisy variables, the output of a highly predictive model may fail to account for these relationships. This implies explanations of a trained model's behavior may fail to provide meaningful insight into the DGP. In this paper we introduce a novel variable importance algorithm, Shapley Marginal Surplus for Strong Models, that samples the space of possible models to come up with an inferential measure of feature importance. We compare this method to other popular feature importance methods, both Shapley-based and non-Shapley based, and demonstrate significant outperformance in inferential capabilities relative to other methods.
title	Shapley Marginal Surplus for Strong Models
topic	Machine Learning
url	https://arxiv.org/abs/2408.08845

Similar Items