Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Xilai, Zhao, Liye, Yao, Weijun, Di, Haibing, Wang, Wenya, Li, Jing
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.10043
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909031126269952
author	Ma, Xilai Zhao, Liye Yao, Weijun Di, Haibing Wang, Wenya Li, Jing
author_facet	Ma, Xilai Zhao, Liye Yao, Weijun Di, Haibing Wang, Wenya Li, Jing
contents	Large Language Model (LLM) personalization aims to align model behaviors with individual user preferences. Existing methods often focus on isolated user histories, neglecting the essential role of inter-user differences. We propose C-BPO, a framework that personalizes LLMs via preference-calibrated binary signals. By treating target user data as positive feedback and other users' data as an auxiliary set of implicit negative signals, C-BPO captures distinct inter-user differences. To mitigate the preference overlap issue, where shared task knowledge is erroneously penalized, we derive an objective grounded in Positive-Unlabeled (PU) learning theory. This approach purifies negative signals by subtracting ``positive bias'', ensuring alignment with unique idiosyncrasies without compromising general helpfulness. Empirical experiments across various personalization tasks and backbone LLMs show C-BPO consistently outperforms baselines, demonstrating the efficacy of preference-calibrated binary signals in modeling inter-user differences.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_10043
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework Ma, Xilai Zhao, Liye Yao, Weijun Di, Haibing Wang, Wenya Li, Jing Computation and Language Artificial Intelligence Large Language Model (LLM) personalization aims to align model behaviors with individual user preferences. Existing methods often focus on isolated user histories, neglecting the essential role of inter-user differences. We propose C-BPO, a framework that personalizes LLMs via preference-calibrated binary signals. By treating target user data as positive feedback and other users' data as an auxiliary set of implicit negative signals, C-BPO captures distinct inter-user differences. To mitigate the preference overlap issue, where shared task knowledge is erroneously penalized, we derive an objective grounded in Positive-Unlabeled (PU) learning theory. This approach purifies negative signals by subtracting ``positive bias'', ensuring alignment with unique idiosyncrasies without compromising general helpfulness. Empirical experiments across various personalization tasks and backbone LLMs show C-BPO consistently outperforms baselines, demonstrating the efficacy of preference-calibrated binary signals in modeling inter-user differences.
title	Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2605.10043

Similar Items