Aggarwal, P., Ghazvininejad, M., Kim, S., Kulikov, I., Lanchantin, J., Li, X., . . . Zhao, W. (2026). Reasoning over mathematical objects: On-policy reward modeling and test time aggregation.
Cita Chicago Style (17a ed.)Aggarwal, Pranjal, et al. Reasoning over Mathematical Objects: On-policy Reward Modeling and Test Time Aggregation. 2026.
Cita MLA (9a ed.)Aggarwal, Pranjal, et al. Reasoning over Mathematical Objects: On-policy Reward Modeling and Test Time Aggregation. 2026.
Precaución: Estas citas no son 100% exactas.