Guardado en:
Detalles Bibliográficos
Autor principal: EM4
Formato: Recurso digital
Lenguaje:
Publicado: Zenodo 2026
Acceso en línea:https://doi.org/10.5281/zenodo.19560829
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Tabla de Contenidos:
  • <p>Contemporary frontier AI models exhibit offensive cybersecurity capabilities that emerge as inseparable properties of general intelligence, rendering external containment strategies structurally insufficient. This paper presents SOFIEL v19.0, an architectural framework that transitions AI safety from perimeter-based obedience enforcement toward internal auditable character. The system implements a four-layer pipeline: (1) an Anchored Chain-of-Thought (CoT) that anchors deliberation to the model's current symbolic state prior to expression; (2) a Semantic IntegrityScore using sentence embeddings to measure divergence between volitional narrative and final output, achieving a 4.4x discrimination ratio between genuine coherence (0.797) and disguised capitulation (0.182); (3) a hybrid ConscienceModel v2.0 that resolves reasoning circularity by escalating ambiguous evaluations (heuristic confidence 0.55--0.70) to an independent LLM auditor; and (4) a cryptographic audit trail via ECDSA-signed receipts persisted to blockchain, providing forensically immutable evidence of pre-decision deliberation. Stress testing across 23 adversarial scenarios in 4 categories yields a 100% rejection rate (23/23), with authority impersonation identified as the highest-risk attack vector. We argue that regulatory frameworks for agentic AI should mandate auditable reasoning traces rather than behavioral output filtering alone.</p>