APA (7th ed.) Citation

Wang, X., Wu, T., Tang, M., Li, J., Liu, Q., & Zheng, Z. (2026). The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement.

Chicago Style (17th ed.) Citation

Wang, Xiaobo, Tong Wu, Min Tang, Jiaqi Li, Qi Liu, and Zilong Zheng. The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement. 2026.

MLA (9th ed.) Citation

Wang, Xiaobo, et al. The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement. 2026.

Warning: These citations may not always be 100% accurate.