Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Yi-Long, Song, Jiajun, Wang, Wei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.27328
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912687410118656
author	Lu, Yi-Long Song, Jiajun Wang, Wei
author_facet	Lu, Yi-Long Song, Jiajun Wang, Wei
contents	A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_27328
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A Unified Representation Underlying the Judgment of Large Language Models Lu, Yi-Long Song, Jiajun Wang, Wei Computation and Language A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
title	A Unified Representation Underlying the Judgment of Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2510.27328

Similar Items