Saved in:
Bibliographic Details
Main Authors: Liu, Gongye, Yang, Bo, Zhi, Yida, Zhong, Zhizhou, Ke, Lei, Deng, Didan, Gao, Han, Huang, Yongxiang, Zhang, Kaihao, Fu, Hongbo, Luo, Wenhan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11146
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916039373094912
author Liu, Gongye
Yang, Bo
Zhi, Yida
Zhong, Zhizhou
Ke, Lei
Deng, Didan
Gao, Han
Huang, Yongxiang
Zhang, Kaihao
Fu, Hongbo
Luo, Wenhan
author_facet Liu, Gongye
Yang, Bo
Zhi, Yida
Zhong, Zhizhou
Ke, Lei
Deng, Didan
Gao, Han
Huang, Yongxiang
Zhang, Kaihao
Fu, Hongbo
Luo, Wenhan
contents Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper, we propose DiNa-LRM, a diffusion-native latent reward model that formulates preference learning directly on noisy diffusion states. Our method introduces a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty. DiNa-LRM leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head, and supports inference-time noise ensembling, providing a diffusion-native mechanism for test-time scaling and robust rewarding. Across image alignment benchmarks, DiNa-LRM substantially outperforms existing diffusion-based reward baselines and achieves performance competitive with state-of-the-art VLMs at a fraction of the computational cost. In preference optimization, we demonstrate that DiNa-LRM improves preference optimization dynamics, enabling faster and more resource-efficient model alignment.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11146
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
Liu, Gongye
Yang, Bo
Zhi, Yida
Zhong, Zhizhou
Ke, Lei
Deng, Didan
Gao, Han
Huang, Yongxiang
Zhang, Kaihao
Fu, Hongbo
Luo, Wenhan
Computer Vision and Pattern Recognition
Artificial Intelligence
Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper, we propose DiNa-LRM, a diffusion-native latent reward model that formulates preference learning directly on noisy diffusion states. Our method introduces a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty. DiNa-LRM leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head, and supports inference-time noise ensembling, providing a diffusion-native mechanism for test-time scaling and robust rewarding. Across image alignment benchmarks, DiNa-LRM substantially outperforms existing diffusion-based reward baselines and achieves performance competitive with state-of-the-art VLMs at a fraction of the computational cost. In preference optimization, we demonstrate that DiNa-LRM improves preference optimization dynamics, enabling faster and more resource-efficient model alignment.
title Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2602.11146