Zapisane w:
| 1. autor: | |
|---|---|
| Format: | Recurso digital |
| Język: | |
| Wydane: |
Zenodo
2026
|
| Hasła przedmiotowe: | |
| Dostęp online: | https://doi.org/10.5281/zenodo.20338451 |
| Etykiety: |
Dodaj etykietę
Nie ma etykietki, Dołącz pierwszą etykiete!
|
Spis treści:
- <p>We apply the matrix Amari-Schwarzian trace invariants f_{2k} = tr([U,Q]^{2k}) from Paper 7 (Maino, 2026) to the hidden-state geometry of a transformer language model (DeepSeek-R1-Distill-Llama-8B) during autoregressive generation. The reference matrix Q is derived from the Gram structure of the model's output projection (lm_head weight matrix); hidden states are reshaped as d x d matrices (d = 64, d^2 = 4096 = hidden dimension). Three empirical findings emerge. (1) A symplectic gauge transition: the odd-trace ratio |f_3|/||[U,Q]||^2_F drops from 0.017 at the output layer to 0.002 at mid-depth, consistent with the vanishing predicted by Lemma 5 of Paper 7. (2) Multi-genus persistence: corr(f_2, f_4) = 0.21-0.39 across all tested depths, supporting the independence predicted by Conjecture 6 of Paper 7. (3) A damped velocity-Verlet corrector minimising ||[U,Q]||^2_F at layer 24 produces a 66% reduction in commutator Frobenius norm. In a three-way ablation (baseline vs Hamiltonian correction vs matched-norm random perturbation, 62 interventions at identical schedule), the Hamiltonian direction reduces 3-gram repetition by 20% while the random direction increases it by 9%, establishing that the geometric correction direction is causally significant. The experimental progression from v1.0 (inert due to CFL stiffness in a log-kappa potential, dt = 10^{-6}) through v2.0 (f_2 sign bug: tr(C^2) can be negative for non-symmetric C), v2.1 (70% reduction using the polynomial Frobenius potential, dt = 0.03), to v3.2 (three-way causal ablation) illustrates the engineering chain from theory to validated intervention. The negative result at standard temperature (geometric health is orthogonal to semantic selection when model confidence is high, entropy ~ 0.3 nats) is itself informative about the geometry-semantics boundary in transformers: the correction only crosses logit decision boundaries in the degenerate regime (T = 1.2, entropy ~ 1 nat). The CFL stiffness hierarchy — log-kappa potential (dt ~ 10^{-6}) vs Frobenius potential (dt ~ 0.03), a 30000x improvement — demonstrates that polynomial Hamiltonian potentials avoid the singular-value stiffness that makes rational potentials inert for neural network hidden states at condition numbers kappa ~ 10^5. All experiments conducted on Google Colab (single GPU) using the publicly available DeepSeek-R1-Distill-Llama-8B model. Reproduction code included.</p>