Saved in:
Bibliographic Details
Main Authors: Sofroniew, Nicholas, Kauvar, Isaac, Saunders, William, Chen, Runjin, Henighan, Tom, Hydrie, Sasha, Citro, Craig, Pearce, Adam, Tarng, Julius, Gurnee, Wes, Batson, Joshua, Zimmerman, Sam, Rivoire, Kelley, Fish, Kyle, Olah, Chris, Lindsey, Jack
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.07729
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911578111082496
author Sofroniew, Nicholas
Kauvar, Isaac
Saunders, William
Chen, Runjin
Henighan, Tom
Hydrie, Sasha
Citro, Craig
Pearce, Adam
Tarng, Julius
Gurnee, Wes
Batson, Joshua
Zimmerman, Sam
Rivoire, Kelley
Fish, Kyle
Olah, Chris
Lindsey, Jack
author_facet Sofroniew, Nicholas
Kauvar, Isaac
Saunders, William
Chen, Runjin
Henighan, Tom
Hydrie, Sasha
Citro, Craig
Pearce, Adam
Tarng, Julius
Gurnee, Wes
Batson, Joshua
Zimmerman, Sam
Rivoire, Kelley
Fish, Kyle
Olah, Chris
Lindsey, Jack
contents Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance with that emotion's relevance to processing the present context and predicting upcoming text. Our key finding is that these representations causally influence the LLM's outputs, including Claude's preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy. We refer to this phenomenon as the LLM exhibiting functional emotions: patterns of expression and behavior modeled after humans under the influence of an emotion, which are mediated by underlying abstract representations of emotion concepts. Functional emotions may work quite differently from human emotions, and do not imply that LLMs have any subjective experience of emotions, but appear to be important for understanding the model's behavior.
format Preprint
id arxiv_https___arxiv_org_abs_2604_07729
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Emotion Concepts and their Function in a Large Language Model
Sofroniew, Nicholas
Kauvar, Isaac
Saunders, William
Chen, Runjin
Henighan, Tom
Hydrie, Sasha
Citro, Craig
Pearce, Adam
Tarng, Julius
Gurnee, Wes
Batson, Joshua
Zimmerman, Sam
Rivoire, Kelley
Fish, Kyle
Olah, Chris
Lindsey, Jack
Artificial Intelligence
Computation and Language
Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance with that emotion's relevance to processing the present context and predicting upcoming text. Our key finding is that these representations causally influence the LLM's outputs, including Claude's preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy. We refer to this phenomenon as the LLM exhibiting functional emotions: patterns of expression and behavior modeled after humans under the influence of an emotion, which are mediated by underlying abstract representations of emotion concepts. Functional emotions may work quite differently from human emotions, and do not imply that LLMs have any subjective experience of emotions, but appear to be important for understanding the model's behavior.
title Emotion Concepts and their Function in a Large Language Model
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2604.07729