MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Pukdee, Rattana, Balcan, Maria-Florina, Ravikumar, Pradeep
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2602.10286
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911730401017856
author	Pukdee, Rattana Balcan, Maria-Florina Ravikumar, Pradeep
author_facet	Pukdee, Rattana Balcan, Maria-Florina Ravikumar, Pradeep
contents	Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over response $y^-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_10286
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	What Does Preference Learning Recover from Pairwise Comparison Data? Pukdee, Rattana Balcan, Maria-Florina Ravikumar, Pradeep Machine Learning Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over response $y^-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.
title	What Does Preference Learning Recover from Pairwise Comparison Data?
topic	Machine Learning
url	https://arxiv.org/abs/2602.10286

Documenti analoghi