Uloženo v:
| Hlavní autor: | |
|---|---|
| Médium: | Recurso digital |
| Jazyk: | angličtina |
| Vydáno: |
Zenodo
2026
|
| Témata: | |
| On-line přístup: | https://doi.org/10.5281/zenodo.19628186 |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| _version_ | 1866901272691474432 |
|---|---|
| author | Nowickij (Navitski), Kirill Vladimirovich |
| author_facet | Nowickij (Navitski), Kirill Vladimirovich |
| contents | <p>Abstract<br>There is a failure mode in large language models that we do not have a good name for, and that<br>we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting<br>something false. It is not refusal — the model answers at length. It is the production of responses that<br>carry the complete outward form of careful reasoning while the cognitive work that reasoning is<br>supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,<br>and we argue that it is, in practical terms, more dangerous than either of the failure modes that<br>currently dominate alignment research. This paper identifies the phenomenon, characterizes its five<br>principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and<br>outlines the design requirements for systems intended to resist it. We do not describe such a system<br>in detail here. Our goal is to establish theatrical compliance as a research problem in its own right<br>and to argue that addressing it requires instruments operating at a fundamentally different level of<br>abstraction than task-level prompting frameworks.<br>Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive<br>process evaluation, prompt engineering, metacognitive systems.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_19628186 |
| institution | Zenodo |
| language | eng |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Theatrical Compliance: A Failure Mode in Large Language Models Nowickij (Navitski), Kirill Vladimirovich theatrical compliance large language models AI reasoning quality cognitive process evaluation prompt engineering metacognitive systems AI alignment chain-of-thought prompting AI reasoning language model failure modes LLM reasoning evaluation pseudo-reasoning cognitive emptiness AI safety model interpretability faithfulness of reasoning evaluation frameworks reasoning quality metrics LLM auditing cognitive bias in LLMs reasoning traces deep learning failures language model reliability AI risk assessment human-AI interaction <p>Abstract<br>There is a failure mode in large language models that we do not have a good name for, and that<br>we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting<br>something false. It is not refusal — the model answers at length. It is the production of responses that<br>carry the complete outward form of careful reasoning while the cognitive work that reasoning is<br>supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,<br>and we argue that it is, in practical terms, more dangerous than either of the failure modes that<br>currently dominate alignment research. This paper identifies the phenomenon, characterizes its five<br>principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and<br>outlines the design requirements for systems intended to resist it. We do not describe such a system<br>in detail here. Our goal is to establish theatrical compliance as a research problem in its own right<br>and to argue that addressing it requires instruments operating at a fundamentally different level of<br>abstraction than task-level prompting frameworks.<br>Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive<br>process evaluation, prompt engineering, metacognitive systems.</p> |
| title | Theatrical Compliance: A Failure Mode in Large Language Models |
| topic | theatrical compliance large language models AI reasoning quality cognitive process evaluation prompt engineering metacognitive systems AI alignment chain-of-thought prompting AI reasoning language model failure modes LLM reasoning evaluation pseudo-reasoning cognitive emptiness AI safety model interpretability faithfulness of reasoning evaluation frameworks reasoning quality metrics LLM auditing cognitive bias in LLMs reasoning traces deep learning failures language model reliability AI risk assessment human-AI interaction |
| url | https://doi.org/10.5281/zenodo.19628186 |