Min, Z., Liu, B., Wang, A., Zhang, L., Zeng, A., Zhang, H., & Su, J. (2026). Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR.
Chicago Style (17th ed.) CitationMin, Zijun, Bingshuai Liu, Ante Wang, Long Zhang, Anxiang Zeng, Haibo Zhang, and Jinsong Su. Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR. 2026.
MLA (9th ed.) CitationMin, Zijun, et al. Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR. 2026.
Warning: These citations may not always be 100% accurate.