Yao, J., Huang, H., Luo, C., Wu, D., Liu, Z., Guo, Y., & Kang, Y. (2026). Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization.
Chicago Style (17th ed.) CitationYao, Jiashu, Heyan Huang, Chuwei Luo, Daiqing Wu, Zeming Liu, Yuhang Guo, and Yangyang Kang. Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization. 2026.
MLA (9th ed.) CitationYao, Jiashu, et al. Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization. 2026.
Warning: These citations may not always be 100% accurate.