Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Izadi, Amirmohammad, Banayeeanzade, Mohammadali, Mirrokni, Alireza, Hasani, Hosein, Bagherian, Mobin, Mehri, Faridoun, Baghshah, Mahdieh Soleymani
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.13652
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910227609157632
author	Izadi, Amirmohammad Banayeeanzade, Mohammadali Mirrokni, Alireza Hasani, Hosein Bagherian, Mobin Mehri, Faridoun Baghshah, Mahdieh Soleymani
author_facet	Izadi, Amirmohammad Banayeeanzade, Mohammadali Mirrokni, Alireza Hasani, Hosein Bagherian, Mobin Mehri, Faridoun Baghshah, Mahdieh Soleymani
contents	Attribution methods for Vision Transformers (ViTs) aim to identify image regions that influence model predictions, but producing faithful and well-localized attributions remains challenging. Existing attribution methods face several limitations, with gradient-based, relevance-propagation, and attention-based methods relying on local approximations, while perturbation or optimization-based methods intervene on inputs, tokens, or surrogates rather than internal patch representations. The key challenge is that class-relevant evidence is formed through interactions between patch tokens across layers; methods that operate only on input changes, attention weights, or backward relevance signals may therefore provide indirect proxies for patch importance rather than directly testing the predictive effect of contextualized patch representations. We propose Causal Attribution via Activation Patching (CAAP), which estimates the contribution of individual image patches to the ViT's prediction by directly intervening on internal activations rather than using learned masks or synthetic perturbation patterns. For each patch, CAAP inserts the corresponding source-image activations into a neutral target context over an intermediate range of layers and uses the resulting target-class score as the attribution signal. The resulting attribution map reflects the causal contribution of patch-associated internal representations on the model's prediction. The causal intervention serves as a principled measure of patch influence by capturing semantic evidence after initial representation formation, while avoiding late-layer global mixing that can reduce spatial specificity. Across multiple ViT backbones and standard metrics, CAAP consistently outperforms existing methods in various settings and produces more faithful and localized attributions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_13652
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Causal Attribution via Activation Patching Izadi, Amirmohammad Banayeeanzade, Mohammadali Mirrokni, Alireza Hasani, Hosein Bagherian, Mobin Mehri, Faridoun Baghshah, Mahdieh Soleymani Computer Vision and Pattern Recognition Attribution methods for Vision Transformers (ViTs) aim to identify image regions that influence model predictions, but producing faithful and well-localized attributions remains challenging. Existing attribution methods face several limitations, with gradient-based, relevance-propagation, and attention-based methods relying on local approximations, while perturbation or optimization-based methods intervene on inputs, tokens, or surrogates rather than internal patch representations. The key challenge is that class-relevant evidence is formed through interactions between patch tokens across layers; methods that operate only on input changes, attention weights, or backward relevance signals may therefore provide indirect proxies for patch importance rather than directly testing the predictive effect of contextualized patch representations. We propose Causal Attribution via Activation Patching (CAAP), which estimates the contribution of individual image patches to the ViT's prediction by directly intervening on internal activations rather than using learned masks or synthetic perturbation patterns. For each patch, CAAP inserts the corresponding source-image activations into a neutral target context over an intermediate range of layers and uses the resulting target-class score as the attribution signal. The resulting attribution map reflects the causal contribution of patch-associated internal representations on the model's prediction. The causal intervention serves as a principled measure of patch influence by capturing semantic evidence after initial representation formation, while avoiding late-layer global mixing that can reduce spatial specificity. Across multiple ViT backbones and standard metrics, CAAP consistently outperforms existing methods in various settings and produces more faithful and localized attributions.
title	Causal Attribution via Activation Patching
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.13652

Similar Items