Feng, Y., Kwiatkowski, A., Zheng, K., Kempe, J., & Duan, Y. (2025). PILAF: Optimal Human Preference Sampling for Reward Modeling.
Chicago Style (17th ed.) CitationFeng, Yunzhen, Ariel Kwiatkowski, Kunhao Zheng, Julia Kempe, and Yaqi Duan. PILAF: Optimal Human Preference Sampling for Reward Modeling. 2025.
MLA (9th ed.) CitationFeng, Yunzhen, et al. PILAF: Optimal Human Preference Sampling for Reward Modeling. 2025.
Warning: These citations may not always be 100% accurate.