Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Lan, Valentino, Marco, Freitas, Andre
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Formal Languages and Automata Theory
Online Access:	https://arxiv.org/abs/2502.12065
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908518736461824
author	Zhang, Lan Valentino, Marco Freitas, Andre
author_facet	Zhang, Lan Valentino, Marco Freitas, Andre
contents	Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions: a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we augment LLMs' formalizations through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_12065
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions Zhang, Lan Valentino, Marco Freitas, Andre Computation and Language Formal Languages and Automata Theory Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions: a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we augment LLMs' formalizations through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.
title	Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
topic	Computation and Language Formal Languages and Automata Theory
url	https://arxiv.org/abs/2502.12065

Similar Items