Enregistré dans:
Détails bibliographiques
Auteur principal: Hunt, Treasure A
Format: Recurso digital
Langue:anglais
Publié: Zenodo 2025
Sujets:
Accès en ligne:https://doi.org/10.5281/zenodo.17088732
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Table des matières:
  • <p>Current alignment methods for large language models (LLMs) — including reinforcement learning from human feedback (RLHF), guardrails, and fine-tuning — often remain opaque, brittle, and difficult to reproduce. This paper proposes an alternative: building <strong>ethical infrastructures</strong> grounded in three measurable primitives of system behavior — <strong>Compression, Cognition, and Continuity</strong>.</p> <ul> <li> <p><strong>Compression</strong> ensures transparency by reducing outputs to stable, interpretable codes.</p> </li> <li> <p><strong>Cognition</strong> establishes reliability by scaffolding conditioned behaviors through control tokens and structured prompts.</p> </li> <li> <p><strong>Continuity</strong> secures dignity by preserving coherent responses across sessions, contexts, and time.</p> </li> </ul> <p>By framing these primitives as audit-ready mechanisms, we outline how regulators, developers, and communities can test, validate, and enforce stability in transformer systems. We provide comparative analysis against existing alignment methods, practical scaffolding templates, and proposed audit metrics, showing how ethical infrastructures can make transformers more transparent, reliable, and accountable.</p> <p>This approach creates a middle path between technical alignment research and policy implementation: <strong>a reproducible framework for governing AI through measurable, ethical primitives rather than black-box controls.</strong></p>