Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Recurso digital |
| Lenguaje: | inglés |
| Publicado: |
Zenodo
2026
|
| Materias: | |
| Acceso en línea: | https://doi.org/10.5281/zenodo.18475763 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Tabla de Contenidos:
- <p><strong>AI laboratories invest substantial effort in evaluation, benchmarking, and safety testing to ensure system reliability and mitigate harm. These practices predominantly assess model behaviour through outputs, task performance, and controlled test conditions. This paper identifies a structural limitation of such approaches: AI-mediated judgment formation can introduce governance-relevant risk even when models pass established evaluations. Drawing on AI safety research, evaluation science, and human–AI interaction literature, the paper demonstrates that laboratory sufficiency does not imply governance sufficiency. The resulting gap is not a failure of scientific diligence, but a boundary mismatch between evaluation objects and real-world judgment-shaping interaction.</strong></p>