Saved in:
Bibliographic Details
Main Author: Akalpler, Ahmet
Format: Recurso digital
Language:
Published: Zenodo 2026
Online Access:https://doi.org/10.5281/zenodo.19672904
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <p>This paper revisits an apparent self-reference signal in Gemma 3 4B after a corrected rerun of a 32-triplet MLP-ablation study. Across 6,528 interventions spanning all 34 layers, the corrected v2 analysis yields one FDR-significant condition: zero-ablation self-vs-third-person comparisons in self-recognition, with significant layers 8, 12, 13, 18, 19, and 28. Historically salient Layer 26 shows small self-minus-control residuals across self-recognition, capability awareness, and metacognition (0.164, -0.045, and -0.016 nats; all two-sided p > 0.31), which is more consistent with first-person framing than with AI-specific self-reference. Training-knowledge prompts, the strongest signal in an earlier v1 zero-ablation pilot, yield no FDR-significant layers under the tighter v2 design, though that result remains hard to interpret because prompt matching is weakest in that domain. A supplementary v3 attention-head follow-up yields a small exploratory set of uncorrected two-sided residual deviations, but no robust FDR-screened cross-domain head alternative. Within this English prompt setting, the corrected evidence is more consistent with a narrow first-person-framing account than with replicated AI-specific self-reference in Gemma 3 4B’s MLP layers.</p>