Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bai, Jinbin, Lei, Yu, Shi, Qingyu, Feng, Aosong, Xin, Yi, Zhao, Zhuoran, Shen, Fei, Yu, Kaidong, Li, Jason
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.04653
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915983766061056
author	Bai, Jinbin Lei, Yu Shi, Qingyu Feng, Aosong Xin, Yi Zhao, Zhuoran Shen, Fei Yu, Kaidong Li, Jason
author_facet	Bai, Jinbin Lei, Yu Shi, Qingyu Feng, Aosong Xin, Yi Zhao, Zhuoran Shen, Fei Yu, Kaidong Li, Jason
contents	Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intractable. We propose a threshold-guided alignment framework that replaces this oracle baseline with a data-driven global threshold estimated from empirical score statistics. This formulation turns alignment into a binary decision task on unpaired data, enabling effective optimization directly from scalar feedback. We also incorporate a confidence weighting term to emphasize samples whose scores deviate strongly from the threshold, improving sample efficiency. Experiments across both diffusion and masked generative paradigms, spanning three test sets and five reward models, show that our method consistently improves preference alignment over previous methods. These results position our threshold-guided framework as a simple yet principled alternative for aligning visual generative models without paired comparisons.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_04653
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Threshold-Guided Optimization for Visual Generative Models Bai, Jinbin Lei, Yu Shi, Qingyu Feng, Aosong Xin, Yi Zhao, Zhuoran Shen, Fei Yu, Kaidong Li, Jason Machine Learning Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intractable. We propose a threshold-guided alignment framework that replaces this oracle baseline with a data-driven global threshold estimated from empirical score statistics. This formulation turns alignment into a binary decision task on unpaired data, enabling effective optimization directly from scalar feedback. We also incorporate a confidence weighting term to emphasize samples whose scores deviate strongly from the threshold, improving sample efficiency. Experiments across both diffusion and masked generative paradigms, spanning three test sets and five reward models, show that our method consistently improves preference alignment over previous methods. These results position our threshold-guided framework as a simple yet principled alternative for aligning visual generative models without paired comparisons.
title	Threshold-Guided Optimization for Visual Generative Models
topic	Machine Learning
url	https://arxiv.org/abs/2605.04653

Similar Items