Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liedeker, Felix, Ell, Basil, Cimiano, Philipp, Düsing, Christoph
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2603.15607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911520354467840
author	Liedeker, Felix Ell, Basil Cimiano, Philipp Düsing, Christoph
author_facet	Liedeker, Felix Ell, Basil Cimiano, Philipp Düsing, Christoph
contents	Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation metrics that are rarely validated against human judgments of explanation quality. This raises the question of whether such metrics meaningfully reflect user perceptions. We address this question through an empirical study that directly compares algorithmic evaluation metrics with human judgments across three datasets. Participants rated counterfactual explanations along multiple dimensions of perceived quality, which we relate to a comprehensive set of standard counterfactual metrics. We analyze both individual relationships and the extent to which combinations of metrics can predict human assessments. Our results show that correlations between algorithmic metrics and human ratings are generally weak and strongly dataset-dependent. Moreover, increasing the number of metrics used in predictive models does not lead to reliable improvements, indicating structural limitations in how current metrics capture criteria relevant for humans. Overall, our findings suggest that widely used counterfactual evaluation metrics fail to reflect key aspects of explanation quality as perceived by users, underscoring the need for more human-centered approaches to evaluating explainable artificial intelligence.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_15607
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Do Metrics for Counterfactual Explanations Align with User Perception? Liedeker, Felix Ell, Basil Cimiano, Philipp Düsing, Christoph Artificial Intelligence Human-Computer Interaction Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation metrics that are rarely validated against human judgments of explanation quality. This raises the question of whether such metrics meaningfully reflect user perceptions. We address this question through an empirical study that directly compares algorithmic evaluation metrics with human judgments across three datasets. Participants rated counterfactual explanations along multiple dimensions of perceived quality, which we relate to a comprehensive set of standard counterfactual metrics. We analyze both individual relationships and the extent to which combinations of metrics can predict human assessments. Our results show that correlations between algorithmic metrics and human ratings are generally weak and strongly dataset-dependent. Moreover, increasing the number of metrics used in predictive models does not lead to reliable improvements, indicating structural limitations in how current metrics capture criteria relevant for humans. Overall, our findings suggest that widely used counterfactual evaluation metrics fail to reflect key aspects of explanation quality as perceived by users, underscoring the need for more human-centered approaches to evaluating explainable artificial intelligence.
title	Do Metrics for Counterfactual Explanations Align with User Perception?
topic	Artificial Intelligence Human-Computer Interaction
url	https://arxiv.org/abs/2603.15607

Similar Items