Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Recurso digital |
| Lenguaje: | |
| Publicado: |
Zenodo
2026
|
| Acceso en línea: | https://doi.org/10.5281/zenodo.18930213 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866901175491624960 |
|---|---|
| author | Li, Y.Y.N. |
| author_facet | Li, Y.Y.N. |
| contents | <p>Catastrophic forgetting arises when gradient updates for a new task overwrite parameter directions critical to a previously learned task. We argue that the information field tensor Gamma_info -- a curvature object derived from the entropy functional of the model's predictive distribution [Li 2026] -- provides a geometry-informed signal for continual learning: directions in the approximate null space of Gamma_info are information-neutral and potentially safe to update.</p> <p>We instantiate this view through an audit-gated gradient projection family. Rather than claiming exact full-parameter null-space recovery, we use Gamma cross-batch reproducibility audits: each parameter's Gamma_info sub-block is estimated on two disjoint batch halves, and only parameters whose near-null eigenspaces align across both halves (Criterion A > 0.5, passing in >= 2 independent subsets) enter gradient projection. A matched random-direction control (same support indices and subspace rank) isolates whether the audit-identified direction -- not merely the projection operation -- is the source of any forgetting benefit.</p> <p>In a cross-domain continual learning experiment (GPT-2, WikiText-2 -> Biomedical Medical QA), the audit gates 41-42 of 42 candidate parameters across 5 random seeds, demonstrating robust null-space structure throughout GPT-2 layers h.6-h.11. Audit-gated null projection (gamma_along) significantly reduces forgetting versus unconstrained fine-tuning (+331 +/- 30 vs. +414 +/- 45, Delta = -83, p < 0.05, 5 seeds), while preserving Task B perplexity (9.55 vs. 9.54 for free). The direction signal is directionally present: gamma_along < gamma_random (Delta = -38), supporting the geometric claim that audit-identified null directions -- not merely projection -- reduce forgetting.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_18930213 |
| institution | Zenodo |
| language | |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Audit-Gated Gradient Projection from Information-Curvature\\ Index Subspaces for Continual Learning Li, Y.Y.N. <p>Catastrophic forgetting arises when gradient updates for a new task overwrite parameter directions critical to a previously learned task. We argue that the information field tensor Gamma_info -- a curvature object derived from the entropy functional of the model's predictive distribution [Li 2026] -- provides a geometry-informed signal for continual learning: directions in the approximate null space of Gamma_info are information-neutral and potentially safe to update.</p> <p>We instantiate this view through an audit-gated gradient projection family. Rather than claiming exact full-parameter null-space recovery, we use Gamma cross-batch reproducibility audits: each parameter's Gamma_info sub-block is estimated on two disjoint batch halves, and only parameters whose near-null eigenspaces align across both halves (Criterion A > 0.5, passing in >= 2 independent subsets) enter gradient projection. A matched random-direction control (same support indices and subspace rank) isolates whether the audit-identified direction -- not merely the projection operation -- is the source of any forgetting benefit.</p> <p>In a cross-domain continual learning experiment (GPT-2, WikiText-2 -> Biomedical Medical QA), the audit gates 41-42 of 42 candidate parameters across 5 random seeds, demonstrating robust null-space structure throughout GPT-2 layers h.6-h.11. Audit-gated null projection (gamma_along) significantly reduces forgetting versus unconstrained fine-tuning (+331 +/- 30 vs. +414 +/- 45, Delta = -83, p < 0.05, 5 seeds), while preserving Task B perplexity (9.55 vs. 9.54 for free). The direction signal is directionally present: gamma_along < gamma_random (Delta = -38), supporting the geometric claim that audit-identified null directions -- not merely projection -- reduce forgetting.</p> |
| title | Audit-Gated Gradient Projection from Information-Curvature\\ Index Subspaces for Continual Learning |
| url | https://doi.org/10.5281/zenodo.18930213 |