Wang, X., Wu, T., Tang, M., Li, J., Liu, Q., & Zheng, Z. (2026). The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement.
Chicago Style (17th ed.) CitationWang, Xiaobo, Tong Wu, Min Tang, Jiaqi Li, Qi Liu, and Zilong Zheng. The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement. 2026.
MLA (9th ed.) CitationWang, Xiaobo, et al. The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement. 2026.
Warning: These citations may not always be 100% accurate.