Scheid, A., Boursier, E., Durmus, A., Jordan, M. I., Ménard, P., Moulines, E., & Valko, M. (2024). Optimal Design for Reward Modeling in RLHF.
Chicago Style (17th ed.) CitationScheid, Antoine, Etienne Boursier, Alain Durmus, Michael I. Jordan, Pierre Ménard, Eric Moulines, and Michal Valko. Optimal Design for Reward Modeling in RLHF. 2024.
MLA (9th ed.) CitationScheid, Antoine, et al. Optimal Design for Reward Modeling in RLHF. 2024.
Warning: These citations may not always be 100% accurate.