Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	wang, Dong, Yu, Junji, Shu, Honglin, Fu, Michael, Tantithamthavorn, Chakkrit, Kamei, Yasutaka, Chen, Junjie
Format:	Preprint
Published:	2025
Subjects:	Software Engineering
Online Access:	https://arxiv.org/abs/2508.03470
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913976230608896
author	wang, Dong Yu, Junji Shu, Honglin Fu, Michael Tantithamthavorn, Chakkrit Kamei, Yasutaka Chen, Junjie
author_facet	wang, Dong Yu, Junji Shu, Honglin Fu, Michael Tantithamthavorn, Chakkrit Kamei, Yasutaka Chen, Junjie
contents	Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promise of LLM on multilingual vulnerability repair and the reasons behind LLM's failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs for multilingual vulnerability repair.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_03470
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	On the Evaluation of Large Language Models in Multilingual Vulnerability Repair wang, Dong Yu, Junji Shu, Honglin Fu, Michael Tantithamthavorn, Chakkrit Kamei, Yasutaka Chen, Junjie Software Engineering Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promise of LLM on multilingual vulnerability repair and the reasons behind LLM's failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs for multilingual vulnerability repair.
title	On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
topic	Software Engineering
url	https://arxiv.org/abs/2508.03470

Similar Items