Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Siyu, Mcmillan, Kenneth
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.04588
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918230662053888
author	Zhang, Siyu Mcmillan, Kenneth
author_facet	Zhang, Siyu Mcmillan, Kenneth
contents	Faithfulness metrics such as insertion and deletion evaluate how feature removal affects model outputs but overlook whether explanations preserve the computational pathway the network actually uses. We show that external metrics can be maximized through alternative pathways -- perturbations that reroute computation via different feature detectors while preserving output behavior. To address this, we propose activation preservation as a tractable proxy for preserving computational pathways We introduce Faithfulness-guided Ensemble Interpretation (FEI), which jointly optimizes external faithfulness (via ensemble quantile optimization of insertion/deletion curves) and internal faithfulness (via selective gradient clipping). Across VGG and ResNet on ImageNet and CUB-200-2011, FEI achieves state-of-the-art insertion/deletion scores while maintaining significantly lower activation deviation, showing that both external and internal faithfulness are essential for reliable explanations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_04588
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Output Faithfulness: Learning Attributions that Preserve Computational Pathways Zhang, Siyu Mcmillan, Kenneth Machine Learning Artificial Intelligence Faithfulness metrics such as insertion and deletion evaluate how feature removal affects model outputs but overlook whether explanations preserve the computational pathway the network actually uses. We show that external metrics can be maximized through alternative pathways -- perturbations that reroute computation via different feature detectors while preserving output behavior. To address this, we propose activation preservation as a tractable proxy for preserving computational pathways We introduce Faithfulness-guided Ensemble Interpretation (FEI), which jointly optimizes external faithfulness (via ensemble quantile optimization of insertion/deletion curves) and internal faithfulness (via selective gradient clipping). Across VGG and ResNet on ImageNet and CUB-200-2011, FEI achieves state-of-the-art insertion/deletion scores while maintaining significantly lower activation deviation, showing that both external and internal faithfulness are essential for reliable explanations.
title	Beyond Output Faithfulness: Learning Attributions that Preserve Computational Pathways
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2509.04588

Similar Items