Saved in:
Bibliographic Details
Main Authors: de Marchi, Daniel, Kosorok, Michael, de Marchi, Scott
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.08845
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916359849377792
author de Marchi, Daniel
Kosorok, Michael
de Marchi, Scott
author_facet de Marchi, Daniel
Kosorok, Michael
de Marchi, Scott
contents Shapley values have seen widespread use in machine learning as a way to explain model predictions and estimate the importance of covariates. Accurately explaining models is critical in real-world models to both aid in decision making and to infer the properties of the true data-generating process (DGP). In this paper, we demonstrate that while model-based Shapley values might be accurate explainers of model predictions, machine learning models themselves are often poor explainers of the DGP even if the model is highly accurate. Particularly in the presence of interrelated or noisy variables, the output of a highly predictive model may fail to account for these relationships. This implies explanations of a trained model's behavior may fail to provide meaningful insight into the DGP. In this paper we introduce a novel variable importance algorithm, Shapley Marginal Surplus for Strong Models, that samples the space of possible models to come up with an inferential measure of feature importance. We compare this method to other popular feature importance methods, both Shapley-based and non-Shapley based, and demonstrate significant outperformance in inferential capabilities relative to other methods.
format Preprint
id arxiv_https___arxiv_org_abs_2408_08845
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Shapley Marginal Surplus for Strong Models
de Marchi, Daniel
Kosorok, Michael
de Marchi, Scott
Machine Learning
Shapley values have seen widespread use in machine learning as a way to explain model predictions and estimate the importance of covariates. Accurately explaining models is critical in real-world models to both aid in decision making and to infer the properties of the true data-generating process (DGP). In this paper, we demonstrate that while model-based Shapley values might be accurate explainers of model predictions, machine learning models themselves are often poor explainers of the DGP even if the model is highly accurate. Particularly in the presence of interrelated or noisy variables, the output of a highly predictive model may fail to account for these relationships. This implies explanations of a trained model's behavior may fail to provide meaningful insight into the DGP. In this paper we introduce a novel variable importance algorithm, Shapley Marginal Surplus for Strong Models, that samples the space of possible models to come up with an inferential measure of feature importance. We compare this method to other popular feature importance methods, both Shapley-based and non-Shapley based, and demonstrate significant outperformance in inferential capabilities relative to other methods.
title Shapley Marginal Surplus for Strong Models
topic Machine Learning
url https://arxiv.org/abs/2408.08845