Xiong, Z., Chen, S., & Lakkaraju, H. (2026). Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning.
Chicago Style (17th ed.) CitationXiong, Zidi, Shan Chen, and Himabindu Lakkaraju. Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning. 2026.
MLA (9th ed.) CitationXiong, Zidi, et al. Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning. 2026.
Warning: These citations may not always be 100% accurate.