Guardado en:
Detalles Bibliográficos
Autor principal: Li, Y.Y.N.
Formato: Recurso digital
Lenguaje:
Publicado: Zenodo 2026
Acceso en línea:https://doi.org/10.5281/zenodo.18930213
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866901175491624960
author Li, Y.Y.N.
author_facet Li, Y.Y.N.
contents <p>Catastrophic forgetting arises when gradient updates for a new task overwrite parameter directions critical to a previously learned task. We argue that the information field tensor Gamma_info -- a curvature object derived from the entropy functional of the model's predictive distribution [Li 2026] -- provides a geometry-informed signal for continual learning: directions in the approximate null space of Gamma_info are information-neutral and potentially safe to update.</p> <p>We instantiate this view through an audit-gated gradient projection family. Rather than claiming exact full-parameter null-space recovery, we use Gamma cross-batch reproducibility audits: each parameter's Gamma_info sub-block is estimated on two disjoint batch halves, and only parameters whose near-null eigenspaces align across both halves (Criterion A > 0.5, passing in >= 2 independent subsets) enter gradient projection. A matched random-direction control (same support indices and subspace rank) isolates whether the audit-identified direction -- not merely the projection operation -- is the source of any forgetting benefit.</p> <p>In a cross-domain continual learning experiment (GPT-2, WikiText-2 -> Biomedical Medical QA), the audit gates 41-42 of 42 candidate parameters across 5 random seeds, demonstrating robust null-space structure throughout GPT-2 layers h.6-h.11. Audit-gated null projection (gamma_along) significantly reduces forgetting versus unconstrained fine-tuning (+331 +/- 30 vs. +414 +/- 45, Delta = -83, p < 0.05, 5 seeds), while preserving Task B perplexity (9.55 vs. 9.54 for free). The direction signal is directionally present: gamma_along < gamma_random (Delta = -38), supporting the geometric claim that audit-identified null directions -- not merely projection -- reduce forgetting.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_18930213
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle Audit-Gated Gradient Projection from Information-Curvature\\ Index Subspaces for Continual Learning
Li, Y.Y.N.
<p>Catastrophic forgetting arises when gradient updates for a new task overwrite parameter directions critical to a previously learned task. We argue that the information field tensor Gamma_info -- a curvature object derived from the entropy functional of the model's predictive distribution [Li 2026] -- provides a geometry-informed signal for continual learning: directions in the approximate null space of Gamma_info are information-neutral and potentially safe to update.</p> <p>We instantiate this view through an audit-gated gradient projection family. Rather than claiming exact full-parameter null-space recovery, we use Gamma cross-batch reproducibility audits: each parameter's Gamma_info sub-block is estimated on two disjoint batch halves, and only parameters whose near-null eigenspaces align across both halves (Criterion A > 0.5, passing in >= 2 independent subsets) enter gradient projection. A matched random-direction control (same support indices and subspace rank) isolates whether the audit-identified direction -- not merely the projection operation -- is the source of any forgetting benefit.</p> <p>In a cross-domain continual learning experiment (GPT-2, WikiText-2 -> Biomedical Medical QA), the audit gates 41-42 of 42 candidate parameters across 5 random seeds, demonstrating robust null-space structure throughout GPT-2 layers h.6-h.11. Audit-gated null projection (gamma_along) significantly reduces forgetting versus unconstrained fine-tuning (+331 +/- 30 vs. +414 +/- 45, Delta = -83, p < 0.05, 5 seeds), while preserving Task B perplexity (9.55 vs. 9.54 for free). The direction signal is directionally present: gamma_along < gamma_random (Delta = -38), supporting the geometric claim that audit-identified null directions -- not merely projection -- reduce forgetting.</p>
title Audit-Gated Gradient Projection from Information-Curvature\\ Index Subspaces for Continual Learning
url https://doi.org/10.5281/zenodo.18930213