Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xue, Zhantong, Ma, Pingchuan, Wang, Zhaoyu, Zhou, Yuguang, Zhang, Xiaoqin, Wang, Shuai, Rahmel, Juergen
Format:	Preprint
Published:	2025
Subjects:	Software Engineering
Online Access:	https://arxiv.org/abs/2509.11708
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908801445134336
author	Xue, Zhantong Ma, Pingchuan Wang, Zhaoyu Zhou, Yuguang Zhang, Xiaoqin Wang, Shuai Rahmel, Juergen
author_facet	Xue, Zhantong Ma, Pingchuan Wang, Zhaoyu Zhou, Yuguang Zhang, Xiaoqin Wang, Shuai Rahmel, Juergen
contents	Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, verifiable computation, and secure finance. However, authoring ZK programs remains challenging: unlike conventional software development, ZK programming manifests a fundamental paradigm shift from \textit{imperative computation} to \textit{declarative verification}. This process requires rigorous reasoning about finite field arithmetic and complex constraint systems (which is rare in common imperative languages), making it knowledge-intensive and error-prone. While large language models (LLMs) have demonstrated strong code generation capabilities in general-purpose languages, their effectiveness for ZK programming, where correctness hinges on both language mastery and constraint-level reasoning, remains unexplored. To address this gap, we propose \textsc{ZK-Eval}, a domain-specific evaluation pipeline that probes LLM capabilities on ZK programming at three levels: language knowledge, algebraic primitive competence, and end-to-end program generation. Our evaluation of four state-of-the-art LLMs reveals that while models demonstrate strong proficiency in language syntax, they struggle when implementing and composing algebraic primitives to specify correct constraint systems, frequently producing incorrect programs. Based on these insights, we introduce \textsc{ZK-Coder}, an agentic framework that augments LLMs with constraint sketching, guided retrieval, and interactive repair. Experiments with GPT-o3 on Circom and Noir show substantial gains, with success rates improving from 20.29\% to 87.85\% and from 28.38\% to 97.79\%, respectively. With \textsc{ZK-Eval} and \textsc{ZK-Coder}, we establish a new basis for systematically measuring and augmenting LLMs in ZK code generation to lower barriers for practitioners and advance privacy computing.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_11708
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation Xue, Zhantong Ma, Pingchuan Wang, Zhaoyu Zhou, Yuguang Zhang, Xiaoqin Wang, Shuai Rahmel, Juergen Software Engineering Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, verifiable computation, and secure finance. However, authoring ZK programs remains challenging: unlike conventional software development, ZK programming manifests a fundamental paradigm shift from \textit{imperative computation} to \textit{declarative verification}. This process requires rigorous reasoning about finite field arithmetic and complex constraint systems (which is rare in common imperative languages), making it knowledge-intensive and error-prone. While large language models (LLMs) have demonstrated strong code generation capabilities in general-purpose languages, their effectiveness for ZK programming, where correctness hinges on both language mastery and constraint-level reasoning, remains unexplored. To address this gap, we propose \textsc{ZK-Eval}, a domain-specific evaluation pipeline that probes LLM capabilities on ZK programming at three levels: language knowledge, algebraic primitive competence, and end-to-end program generation. Our evaluation of four state-of-the-art LLMs reveals that while models demonstrate strong proficiency in language syntax, they struggle when implementing and composing algebraic primitives to specify correct constraint systems, frequently producing incorrect programs. Based on these insights, we introduce \textsc{ZK-Coder}, an agentic framework that augments LLMs with constraint sketching, guided retrieval, and interactive repair. Experiments with GPT-o3 on Circom and Noir show substantial gains, with success rates improving from 20.29\% to 87.85\% and from 28.38\% to 97.79\%, respectively. With \textsc{ZK-Eval} and \textsc{ZK-Coder}, we establish a new basis for systematically measuring and augmenting LLMs in ZK code generation to lower barriers for practitioners and advance privacy computing.
title	From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation
topic	Software Engineering
url	https://arxiv.org/abs/2509.11708

Similar Items