Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04207 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911359971622912 |
|---|---|
| author | Xia, Wei Tang, Haowen Li, Luozheng |
| author_facet | Xia, Wei Tang, Haowen Li, Luozheng |
| contents | LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_04207 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis Xia, Wei Tang, Haowen Li, Luozheng Computation and Language Artificial Intelligence Social and Information Networks I.2.7; K.4.1 LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model. |
| title | Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis |
| topic | Computation and Language Artificial Intelligence Social and Information Networks I.2.7; K.4.1 |
| url | https://arxiv.org/abs/2601.04207 |