Saved in:
| Main Author: | |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2026
|
| Online Access: | https://doi.org/10.5281/zenodo.19672904 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- <p>This paper revisits an apparent self-reference signal in Gemma 3 4B after a corrected rerun of a 32-triplet MLP-ablation study. Across 6,528 interventions spanning all 34 layers, the corrected v2 analysis yields one FDR-significant condition: zero-ablation self-vs-third-person comparisons in self-recognition, with significant layers 8, 12, 13, 18, 19, and 28. Historically salient Layer 26 shows small self-minus-control residuals across self-recognition, capability awareness, and metacognition (0.164, -0.045, and -0.016 nats; all two-sided p > 0.31), which is more consistent with first-person framing than with AI-specific self-reference. Training-knowledge prompts, the strongest signal in an earlier v1 zero-ablation pilot, yield no FDR-significant layers under the tighter v2 design, though that result remains hard to interpret because prompt matching is weakest in that domain. A supplementary v3 attention-head follow-up yields a small exploratory set of uncorrected two-sided residual deviations, but no robust FDR-screened cross-domain head alternative. Within this English prompt setting, the corrected evidence is more consistent with a narrow first-person-framing account than with replicated AI-specific self-reference in Gemma 3 4B’s MLP layers.</p>