Affichage MARC: :: Library Catalog

$Image de couverture de livre$

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Dang, Xingyu, Agarwal, Rohit, Porto, Rodrigo, Goyal, Anirudh, Fowl, Liam H, Arora, Sanjeev
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2602.16793
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915805790208000
author	Dang, Xingyu Agarwal, Rohit Porto, Rodrigo Goyal, Anirudh Fowl, Liam H Arora, Sanjeev
author_facet	Dang, Xingyu Agarwal, Rohit Porto, Rodrigo Goyal, Anirudh Fowl, Liam H Arora, Sanjeev
contents	In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive costs (e.g., 3000 USD per problem). In this work, we present an inference pipeline that attains best-in-class performance on IMO-style math problems at an average inference cost orders of magnitude below competing methods while using only general-purpose off-the-shelf models. Our method relies on insights about grader failure in solver-grader pipelines, which we call the Cognitive Well (iterative refinement converging to a wrong solution that the solver as well as the pipeline's internal grader consider to be basically correct). Our pipeline addresses these failure modes through conjecture extraction, wherein candidate lemmas are isolated from generated solutions and independently verified alongside their negations in a fresh environment (context detachment). On IMO-ProofBench Advanced (PB-Adv), our pipeline achieves 67.1 percent performance using Gemini 3.0 Pro with an average cost per question of approximately 31 USD. At the time of evaluation, this represented the state-of-the-art on PB-Adv among both public and unreleased models, and more than doubles the success rate of the next best publicly accessible pipeline, all at a fraction of the cost.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_16793
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models Dang, Xingyu Agarwal, Rohit Porto, Rodrigo Goyal, Anirudh Fowl, Liam H Arora, Sanjeev Machine Learning In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive costs (e.g., 3000 USD per problem). In this work, we present an inference pipeline that attains best-in-class performance on IMO-style math problems at an average inference cost orders of magnitude below competing methods while using only general-purpose off-the-shelf models. Our method relies on insights about grader failure in solver-grader pipelines, which we call the Cognitive Well (iterative refinement converging to a wrong solution that the solver as well as the pipeline's internal grader consider to be basically correct). Our pipeline addresses these failure modes through conjecture extraction, wherein candidate lemmas are isolated from generated solutions and independently verified alongside their negations in a fresh environment (context detachment). On IMO-ProofBench Advanced (PB-Adv), our pipeline achieves 67.1 percent performance using Gemini 3.0 Pro with an average cost per question of approximately 31 USD. At the time of evaluation, this represented the state-of-the-art on PB-Adv among both public and unreleased models, and more than doubles the success rate of the next best publicly accessible pipeline, all at a fraction of the cost.
title	Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models
topic	Machine Learning
url	https://arxiv.org/abs/2602.16793