Jiang, D. R., Bhandari, J., Yang, Y., Munos, R., & Lu, T. (2025). Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO.
Chicago Style (17th ed.) CitationJiang, Daniel R., Jalaj Bhandari, Yukai Yang, Rémi Munos, and Tyler Lu. Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO. 2025.
MLA (9th ed.) CitationJiang, Daniel R., et al. Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO. 2025.
Warning: These citations may not always be 100% accurate.