Saved in:
Bibliographic Details
Main Authors: Xia, Wei, Tang, Haowen, Li, Luozheng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2601.04207
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model.