Wang, J., Xu, W., Yang, A., Zhou, W., Lu, L., Li, H., . . . Zhu, J. (2025). Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling.
Chicago Style (17th ed.) CitationWang, Jiahao, Weiye Xu, Aijun Yang, Wengang Zhou, Lewei Lu, Houqiang Li, Xiaohua Wang, and Jinguo Zhu. Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling. 2025.
MLA (9th ed.) CitationWang, Jiahao, et al. Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling. 2025.
Warning: These citations may not always be 100% accurate.