Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Lyu, Bohan, Huang, Siqiao, Liang, Zichen
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Computation and Language
Acceso en línea:	https://arxiv.org/abs/2502.11167
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866918520423448576
author	Lyu, Bohan Huang, Siqiao Liang, Zichen
author_facet	Lyu, Bohan Huang, Siqiao Liang, Zichen
contents	Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_11167
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors Lyu, Bohan Huang, Siqiao Liang, Zichen Machine Learning Computation and Language Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.
title	SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2502.11167

Ejemplares similares