Khan, R. M. S., Liu, Z., Tan, Z., Fleming, C., & Chen, T. (2026). TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT.
Chicago Style (17th ed.) CitationKhan, Rana Muhammad Shahroz, Zijie Liu, Zhen Tan, Charles Fleming, and Tianlong Chen. TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT. 2026.
MLA (9th ed.) CitationKhan, Rana Muhammad Shahroz, et al. TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT. 2026.
Warning: These citations may not always be 100% accurate.