Chen, Z., Liu, F., Zhu, X., Qi, Y., & Ghavamzadeh, M. (2025). Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator.
Chicago Style (17th ed.) CitationChen, Zhuotong, Fang Liu, Xuan Zhu, Yanjun Qi, and Mohammad Ghavamzadeh. Preference Optimization via Contrastive Divergence: Your Reward Model Is Secretly an NLL Estimator. 2025.
MLA (9th ed.) CitationChen, Zhuotong, et al. Preference Optimization via Contrastive Divergence: Your Reward Model Is Secretly an NLL Estimator. 2025.
Warning: These citations may not always be 100% accurate.