Sommario: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autore principale:	Nowickij (Navitski), Kirill Vladimirovich
Natura:	Recurso digital
Lingua:	inglese
Pubblicazione:	Zenodo 2026
Soggetti:	theatrical compliance large language models AI reasoning quality cognitive process evaluation prompt engineering metacognitive systems AI alignment chain-of-thought prompting AI reasoning language model failure modes LLM reasoning evaluation pseudo-reasoning cognitive emptiness AI safety model interpretability faithfulness of reasoning evaluation frameworks reasoning quality metrics LLM auditing cognitive bias in LLMs reasoning traces deep learning failures language model reliability AI risk assessment human-AI interaction
Accesso online:	https://doi.org/10.5281/zenodo.19628186
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Sommario:

Abstract There is a failure mode in large language models that we do not have a good name for, and that we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting something false. It is not refusal — the model answers at length. It is the production of responses that carry the complete outward form of careful reasoning while the cognitive work that reasoning is supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance, and we argue that it is, in practical terms, more dangerous than either of the failure modes that currently dominate alignment research. This paper identifies the phenomenon, characterizes its five principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and outlines the design requirements for systems intended to resist it. We do not describe such a system in detail here. Our goal is to establish theatrical compliance as a research problem in its own right and to argue that addressing it requires instruments operating at a fundamentally different level of abstraction than task-level prompting frameworks. Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive process evaluation, prompt engineering, metacognitive systems.

Documenti analoghi