Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Zezheng, Liu, Fengming
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computation and Language 68T07 I.2.6; I.2.0
Online Access:	https://arxiv.org/abs/2605.08012
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915994148012032
author	Lin, Zezheng Liu, Fengming
author_facet	Lin, Zezheng Liu, Fengming
contents	Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions section and a recurring pattern: validation metrics such as faithfulness, completeness, monosemanticity, alignment, or ablation effects are reported as causal support without stating the assumptions that make them identifying. A two-human-coder audit on $n=30$ reproduces the direction of the main finding: dedicated identification sections are absent, and validation-metric substitution is common, though exact Dim B/D counts are coding-rule sensitive. The paper proposes a disclosure norm: state whether the claim is causal, name the identification strategy, enumerate assumptions, stress at least one, and explain how conclusions shift if assumptions fail. Validation is not identification.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_08012
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims Lin, Zezheng Liu, Fengming Machine Learning Artificial Intelligence Computation and Language 68T07 I.2.6; I.2.0 Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions section and a recurring pattern: validation metrics such as faithfulness, completeness, monosemanticity, alignment, or ablation effects are reported as causal support without stating the assumptions that make them identifying. A two-human-coder audit on $n=30$ reproduces the direction of the main finding: dedicated identification sections are absent, and validation-metric substitution is common, though exact Dim B/D counts are coding-rule sensitive. The paper proposes a disclosure norm: state whether the claim is causal, name the identification strategy, enumerate assumptions, stress at least one, and explain how conclusions shift if assumptions fail. Validation is not identification.
title	Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims
topic	Machine Learning Artificial Intelligence Computation and Language 68T07 I.2.6; I.2.0
url	https://arxiv.org/abs/2605.08012

Similar Items