Shu, Y., Wei, C., Lin, H., Qiu, S., & Xiong, H. (2026). Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent.
Chicago Style (17th ed.) CitationShu, Yao, Chenxing Wei, Hongbin Lin, Shuang Qiu, and Hui Xiong. Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent. 2026.
MLA (9th ed.) CitationShu, Yao, et al. Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent. 2026.
Warning: These citations may not always be 100% accurate.