Liang, K., Bai, C., Xu, X., Tang, C., Lee, S., Liu, W., . . . Wu, Y. (2026). ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning.
Chicago Style (17th ed.) CitationLiang, Kun, Clive Bai, Xin Xu, Chenming Tang, Sanwoo Lee, Weijie Liu, Saiyong Yang, and Yunfang Wu. ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning. 2026.
MLA (9th ed.) CitationLiang, Kun, et al. ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning. 2026.
Warning: These citations may not always be 100% accurate.