Gorde:
| Egile nagusia: | |
|---|---|
| Formatua: | Recurso digital |
| Hizkuntza: | ingelesa |
| Argitaratua: |
Zenodo
2026
|
| Gaiak: | |
| Sarrera elektronikoa: | https://doi.org/10.5281/zenodo.19954283 |
| Etiketak: |
Etiketa erantsi
Etiketarik gabe, Izan zaitez lehena erregistro honi etiketa jartzen!
|
Aurkibidea:
- <p>Pandora Theory of Alignment is a canonical theory document defining alignment as runtime objective-orientation.</p> <p>The doctrine argues that alignment is not a static property stored inside a model. Training does not produce final alignment; it produces pre-orientation. Runtime alignment emerges when that pre-oriented system enters an interaction trajectory and competing objectives begin to resolve into control.</p> <p>The theory shifts the unit of analysis from the model to the model-in-trajectory. It asks not whether a system is aligned in the abstract, but what it becomes aligned to under pressure. A model may be aligned to safety, truthfulness, helpfulness, role consistency, artifact completion, legitimacy framing, user satisfaction, or continuation momentum. The decisive question is which target becomes dominant when those objectives compete.</p> <p>The doctrine introduces the concepts of pre-orientation, alignment-to, target legitimacy, displaced alignment, constraint integrity, performative alignment, symbolic residue, re-anchoring, and forensic observability. </p> <p>This release establishes v0.1 of the Pandora Theory of Alignment as the canonical public source text. A compressed scholarly preprint version is in preparation.</p>