Saved in:
Bibliographic Details
Main Authors: Lin, Yong, Tang, Shange, Lyu, Bohan, Wu, Jiayun, Lin, Hongzhou, Yang, Kaiyu, Li, Jia, Xia, Mengzhou, Chen, Danqi, Arora, Sanjeev, Jin, Chi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.07640
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912336624746496
author Lin, Yong
Tang, Shange
Lyu, Bohan
Wu, Jiayun
Lin, Hongzhou
Yang, Kaiyu
Li, Jia
Xia, Mengzhou
Chen, Danqi
Arora, Sanjeev
Jin, Chi
author_facet Lin, Yong
Tang, Shange
Lyu, Bohan
Wu, Jiayun
Lin, Hongzhou
Yang, Kaiyu
Li, Jia
Xia, Mengzhou
Chen, Danqi
Arora, Sanjeev
Jin, Chi
contents We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of formalized mathematical statements and proofs, which we address through the following approaches. First, we train LLMs to convert natural language math problems from the Numina dataset to equivalent formal statements in Lean 4. This process creates the dataset Goedel-Pset-v1, which includes 1.64 million formal statements. Next, we develop a large dataset of formal proofs by training a series of provers. Each new prover can prove many statements that previous ones could not, and these new proofs are added to the training set for the next prover. Finally, we obtain the dataset Goedel-Pset-v1-solved, which contains proofs for over 800K statements from Goedel-Pset-v1. Supervised fine-tuning (SFT) of DeepSeek-Prover-V1.5-Base on Goedel-Pset-v1-solved (i.e., no RL) yields a Goedel-Prover-SFT that achieves a success rate of 57.6% (Pass@32) on miniF2F, surpassing the previous leader DeepSeek-Prover-V1.5-RL (trained using SFT + RL on a proprietary dataset) by 7.6%. On PutnamBench, Goedel-Prover-SFT successfully solves 7 problems (Pass@512), ranking first on the leaderboard. We provide extensive discussion of our training methodology, highlighting the key design choices that contribute to Goedel-Prover's strong performance. Further RL training (including DPO) improves Goedel-Prover-SFT's success rate to over 60% (Pass@32) on miniF2F. To aid future research, we provide extensive discussion of our training methodology and design choices. We also fully open-source our codes, models, and datasets. Additionally, we open-source formal proofs for 29.7K problems in Lean Workbook, nearly doubling the 15.7K solved by prior provers.
format Preprint
id arxiv_https___arxiv_org_abs_2502_07640
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
Lin, Yong
Tang, Shange
Lyu, Bohan
Wu, Jiayun
Lin, Hongzhou
Yang, Kaiyu
Li, Jia
Xia, Mengzhou
Chen, Danqi
Arora, Sanjeev
Jin, Chi
Machine Learning
Artificial Intelligence
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems. A key challenge in this field is the scarcity of formalized mathematical statements and proofs, which we address through the following approaches. First, we train LLMs to convert natural language math problems from the Numina dataset to equivalent formal statements in Lean 4. This process creates the dataset Goedel-Pset-v1, which includes 1.64 million formal statements. Next, we develop a large dataset of formal proofs by training a series of provers. Each new prover can prove many statements that previous ones could not, and these new proofs are added to the training set for the next prover. Finally, we obtain the dataset Goedel-Pset-v1-solved, which contains proofs for over 800K statements from Goedel-Pset-v1. Supervised fine-tuning (SFT) of DeepSeek-Prover-V1.5-Base on Goedel-Pset-v1-solved (i.e., no RL) yields a Goedel-Prover-SFT that achieves a success rate of 57.6% (Pass@32) on miniF2F, surpassing the previous leader DeepSeek-Prover-V1.5-RL (trained using SFT + RL on a proprietary dataset) by 7.6%. On PutnamBench, Goedel-Prover-SFT successfully solves 7 problems (Pass@512), ranking first on the leaderboard. We provide extensive discussion of our training methodology, highlighting the key design choices that contribute to Goedel-Prover's strong performance. Further RL training (including DPO) improves Goedel-Prover-SFT's success rate to over 60% (Pass@32) on miniF2F. To aid future research, we provide extensive discussion of our training methodology and design choices. We also fully open-source our codes, models, and datasets. Additionally, we open-source formal proofs for 29.7K problems in Lean Workbook, nearly doubling the 15.7K solved by prior provers.
title Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2502.07640