_version_ 1866901272691474432
author Nowickij (Navitski), Kirill Vladimirovich
author_facet Nowickij (Navitski), Kirill Vladimirovich
contents <p>Abstract<br>There is a failure mode in large language models that we do not have a good name for, and that<br>we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting<br>something false. It is not refusal — the model answers at length. It is the production of responses that<br>carry the complete outward form of careful reasoning while the cognitive work that reasoning is<br>supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,<br>and we argue that it is, in practical terms, more dangerous than either of the failure modes that<br>currently dominate alignment research. This paper identifies the phenomenon, characterizes its five<br>principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and<br>outlines the design requirements for systems intended to resist it. We do not describe such a system<br>in detail here. Our goal is to establish theatrical compliance as a research problem in its own right<br>and to argue that addressing it requires instruments operating at a fundamentally different level of<br>abstraction than task-level prompting frameworks.<br>Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive<br>process evaluation, prompt engineering, metacognitive systems.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19628186
institution Zenodo
language eng
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle Theatrical Compliance: A Failure Mode in Large Language Models
Nowickij (Navitski), Kirill Vladimirovich
theatrical compliance
large language models
AI reasoning quality
cognitive process evaluation
prompt engineering
metacognitive systems
AI alignment
chain-of-thought prompting
AI reasoning
language model failure modes
LLM reasoning evaluation
pseudo-reasoning
cognitive emptiness
AI safety
model interpretability
faithfulness of reasoning
evaluation frameworks
reasoning quality metrics
LLM auditing
cognitive bias in LLMs
reasoning traces
deep learning failures
language model reliability
AI risk assessment
human-AI interaction
<p>Abstract<br>There is a failure mode in large language models that we do not have a good name for, and that<br>we therefore tend not to treat seriously enough. It is not hallucination — the model is not asserting<br>something false. It is not refusal — the model answers at length. It is the production of responses that<br>carry the complete outward form of careful reasoning while the cognitive work that reasoning is<br>supposed to represent has not, in any meaningful sense, occurred. We call this theatrical compliance,<br>and we argue that it is, in practical terms, more dangerous than either of the failure modes that<br>currently dominate alignment research. This paper identifies the phenomenon, characterizes its five<br>principal forms, explains the asymmetry that makes it particularly costly in high-stakes settings, and<br>outlines the design requirements for systems intended to resist it. We do not describe such a system<br>in detail here. Our goal is to establish theatrical compliance as a research problem in its own right<br>and to argue that addressing it requires instruments operating at a fundamentally different level of<br>abstraction than task-level prompting frameworks.<br>Keywords: theatrical compliance, large language models, AI reasoning quality, cognitive<br>process evaluation, prompt engineering, metacognitive systems.</p>
title Theatrical Compliance: A Failure Mode in Large Language Models
topic theatrical compliance
large language models
AI reasoning quality
cognitive process evaluation
prompt engineering
metacognitive systems
AI alignment
chain-of-thought prompting
AI reasoning
language model failure modes
LLM reasoning evaluation
pseudo-reasoning
cognitive emptiness
AI safety
model interpretability
faithfulness of reasoning
evaluation frameworks
reasoning quality metrics
LLM auditing
cognitive bias in LLMs
reasoning traces
deep learning failures
language model reliability
AI risk assessment
human-AI interaction
url https://doi.org/10.5281/zenodo.19628186