Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xia, Wei, Tang, Haowen, Li, Luozheng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Social and Information Networks I.2.7; K.4.1
Online Access:	https://arxiv.org/abs/2601.04207
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911359971622912
author	Xia, Wei Tang, Haowen Li, Luozheng
author_facet	Xia, Wei Tang, Haowen Li, Luozheng
contents	LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_04207
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis Xia, Wei Tang, Haowen Li, Luozheng Computation and Language Artificial Intelligence Social and Information Networks I.2.7; K.4.1 LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model.
title	Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis
topic	Computation and Language Artificial Intelligence Social and Information Networks I.2.7; K.4.1
url	https://arxiv.org/abs/2601.04207

Similar Items