Salvato in:
| Autore principale: | |
|---|---|
| Natura: | Recurso digital |
| Lingua: | inglese |
| Pubblicazione: |
Zenodo
2026
|
| Soggetti: | |
| Accesso online: | https://doi.org/10.5281/zenodo.19628186 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
Sommario:
- <p>Abstract<br>There is a failure mode in large language models that we do not have a good name for, and that<br>we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting<br>something false. It is not refusal — the model answers at length. It is the production of responses that<br>carry the complete outward form of careful reasoning while the cognitive work that reasoning is<br>supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,<br>and we argue that it is, in practical terms, more dangerous than either of the failure modes that<br>currently dominate alignment research. This paper identifies the phenomenon, characterizes its five<br>principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and<br>outlines the design requirements for systems intended to resist it. We do not describe such a system<br>in detail here. Our goal is to establish theatrical compliance as a research problem in its own right<br>and to argue that addressing it requires instruments operating at a fundamentally different level of<br>abstraction than task-level prompting frameworks.<br>Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive<br>process evaluation, prompt engineering, metacognitive systems.</p>