Saved in:
| Main Author: | |
|---|---|
| Format: | Recurso digital |
| Language: | English |
| Published: |
Zenodo
2026
|
| Subjects: | |
| Online Access: | https://doi.org/10.5281/zenodo.19836755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866901838907834368 |
|---|---|
| author | Kim, Minyeong |
| author_facet | Kim, Minyeong |
| contents | <p><strong>Status:</strong> Submitted version (preprint). Currently under review at <em>Machine Learning</em> (Springer Nature, journal 10994). Submission ID: 8f29c533-4613-48a7-9cbf-1f4eddcc83fa.</p><p><strong>Abstract.</strong> Post-hoc attribution methods are widely deployed to explain deep vision classifiers, yet no systematic evaluation protocol exists for the corrupted-input regime. This paper introduces the first such protocol for attribution stability under distribution shift, validated through a factorial audit of five attribution methods (Integrated Gradients, Grad-CAM, SmoothGrad, GradientSHAP, LIME) under fifteen corruptions at five severity levels on CIFAR-10-C and CIFAR-100-C, across two architecturally distinct classifiers (ResNet-50, ViT-B/16), yielding 760,000 clean-corrupted pairs and five stability metrics.</p><p><strong>Key findings.</strong> (1) Attribution stability declines monotonically with corruption severity for all methods (12 to 13 of 15 corruption types significant under Benjamini-Hochberg correction). (2) Degradation depends dramatically on method: SmoothGrad retains Spearman 0.91 for brightness at severity 3 while LIME falls to 0.04 (twenty-fold gap). (3) The resolution-fair ranking (SmoothGrad, IG, GradientSHAP) is consistent across 96% of cells (144/150) with method eta^2 > 0.84 dwarfing architecture eta^2 < 0.02. (4) The architecture-by-method interaction is non-significant on CIFAR-10 (p = 0.71) but significant on CIFAR-100 (p = 0.035), revealing task-complexity modulation of the model-agnostic claim.</p><p><strong>Files.</strong> (a) <code>paper2_manuscript_preprint.pdf</code>: the submitted manuscript (42 pages, compiled with Springer Nature sn-jnl.cls). (b) <code>paper2_latex_source.zip</code>: full LaTeX source bundle for compilation reproducibility.</p><p><strong>Reproducibility code (separate record).</strong> The analysis code and aggregated statistics are archived at <a href="https://doi.org/10.5281/zenodo.19689329">10.5281/zenodo.19689329</a> (v1.1.0).</p><p><strong>License.</strong> CC BY 4.0.</p><p><strong>Note on peer review.</strong> This preprint is the submitted version prior to peer review. If accepted, a revised Author Accepted Manuscript will be uploaded as a new version after any applicable Springer Nature embargo.</p> <h3>Version History</h3> <p><strong>v2 (2026-04-28) — TMLR submission revision</strong></p> <p>Key improvements over v1:</p> <ul> <li>3.2 Datasets: Added pre-emptive justification of CIFAR-10/100-C scope choice</li> <li>5.1.1 LIME: Reframed sign-reversal as a deployment-relevant finding</li> <li>7 Conclusion: Future work expanded with Cohen 2019, Angelopoulos, Heskes</li> <li>10 Statements: AI declaration restructured into explicit dual-list</li> <li>2.7 Related Work: Added SAE interpretability discussion</li> <li>3.5 Methodology: Added 6-step pipeline pseudocode</li> <li>All Tables: Added directional arrows for notation clarity</li> <li>Bibliography: 68 references</li> <li>HuggingFace checkpoint URLs moved to supplementary for double-blind</li> </ul> <p>Total: 35 pages, 934 KB PDF.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_19836755 |
| institution | Zenodo |
| language | eng |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Systematic Vulnerability Audit of Post-Hoc XAI under Common Corruptions: A Factor Analysis Across Vision Benchmarks Kim, Minyeong Explainable Artificial Intelligence Attribution Stability Distribution Shift Common Corruptions Vision Transformer Model-Agnostic Explanation <p><strong>Status:</strong> Submitted version (preprint). Currently under review at <em>Machine Learning</em> (Springer Nature, journal 10994). Submission ID: 8f29c533-4613-48a7-9cbf-1f4eddcc83fa.</p><p><strong>Abstract.</strong> Post-hoc attribution methods are widely deployed to explain deep vision classifiers, yet no systematic evaluation protocol exists for the corrupted-input regime. This paper introduces the first such protocol for attribution stability under distribution shift, validated through a factorial audit of five attribution methods (Integrated Gradients, Grad-CAM, SmoothGrad, GradientSHAP, LIME) under fifteen corruptions at five severity levels on CIFAR-10-C and CIFAR-100-C, across two architecturally distinct classifiers (ResNet-50, ViT-B/16), yielding 760,000 clean-corrupted pairs and five stability metrics.</p><p><strong>Key findings.</strong> (1) Attribution stability declines monotonically with corruption severity for all methods (12 to 13 of 15 corruption types significant under Benjamini-Hochberg correction). (2) Degradation depends dramatically on method: SmoothGrad retains Spearman 0.91 for brightness at severity 3 while LIME falls to 0.04 (twenty-fold gap). (3) The resolution-fair ranking (SmoothGrad, IG, GradientSHAP) is consistent across 96% of cells (144/150) with method eta^2 > 0.84 dwarfing architecture eta^2 < 0.02. (4) The architecture-by-method interaction is non-significant on CIFAR-10 (p = 0.71) but significant on CIFAR-100 (p = 0.035), revealing task-complexity modulation of the model-agnostic claim.</p><p><strong>Files.</strong> (a) <code>paper2_manuscript_preprint.pdf</code>: the submitted manuscript (42 pages, compiled with Springer Nature sn-jnl.cls). (b) <code>paper2_latex_source.zip</code>: full LaTeX source bundle for compilation reproducibility.</p><p><strong>Reproducibility code (separate record).</strong> The analysis code and aggregated statistics are archived at <a href="https://doi.org/10.5281/zenodo.19689329">10.5281/zenodo.19689329</a> (v1.1.0).</p><p><strong>License.</strong> CC BY 4.0.</p><p><strong>Note on peer review.</strong> This preprint is the submitted version prior to peer review. If accepted, a revised Author Accepted Manuscript will be uploaded as a new version after any applicable Springer Nature embargo.</p> <h3>Version History</h3> <p><strong>v2 (2026-04-28) — TMLR submission revision</strong></p> <p>Key improvements over v1:</p> <ul> <li>3.2 Datasets: Added pre-emptive justification of CIFAR-10/100-C scope choice</li> <li>5.1.1 LIME: Reframed sign-reversal as a deployment-relevant finding</li> <li>7 Conclusion: Future work expanded with Cohen 2019, Angelopoulos, Heskes</li> <li>10 Statements: AI declaration restructured into explicit dual-list</li> <li>2.7 Related Work: Added SAE interpretability discussion</li> <li>3.5 Methodology: Added 6-step pipeline pseudocode</li> <li>All Tables: Added directional arrows for notation clarity</li> <li>Bibliography: 68 references</li> <li>HuggingFace checkpoint URLs moved to supplementary for double-blind</li> </ul> <p>Total: 35 pages, 934 KB PDF.</p> |
| title | Systematic Vulnerability Audit of Post-Hoc XAI under Common Corruptions: A Factor Analysis Across Vision Benchmarks |
| topic | Explainable Artificial Intelligence Attribution Stability Distribution Shift Common Corruptions Vision Transformer Model-Agnostic Explanation |
| url | https://doi.org/10.5281/zenodo.19836755 |