Gao, J., Chen, J., He, C., Xu, S., Jin, D., & Wu, Y. (2026). From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents.
Chicago Style (17th ed.) CitationGao, Jiaxuan, Jiaao Chen, Chuyi He, Shusheng Xu, Di Jin, and Yi Wu. From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents. 2026.
MLA (9th ed.) CitationGao, Jiaxuan, et al. From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents. 2026.
Warning: These citations may not always be 100% accurate.