APA (7th ed.) Citation

Wang, J., Xu, W., Yang, A., Zhou, W., Lu, L., Li, H., . . . Zhu, J. (2025). Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling.

Chicago Style (17th ed.) Citation

Wang, Jiahao, Weiye Xu, Aijun Yang, Wengang Zhou, Lewei Lu, Houqiang Li, Xiaohua Wang, and Jinguo Zhu. Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling. 2025.

MLA (9th ed.) Citation

Wang, Jiahao, et al. Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling. 2025.

Warning: These citations may not always be 100% accurate.