Saved in:
Bibliographic Details
Main Authors: wang, Dong, Yu, Junji, Shu, Honglin, Fu, Michael, Tantithamthavorn, Chakkrit, Kamei, Yasutaka, Chen, Junjie
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.03470
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913976230608896
author wang, Dong
Yu, Junji
Shu, Honglin
Fu, Michael
Tantithamthavorn, Chakkrit
Kamei, Yasutaka
Chen, Junjie
author_facet wang, Dong
Yu, Junji
Shu, Honglin
Fu, Michael
Tantithamthavorn, Chakkrit
Kamei, Yasutaka
Chen, Junjie
contents Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promise of LLM on multilingual vulnerability repair and the reasons behind LLM's failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs for multilingual vulnerability repair.
format Preprint
id arxiv_https___arxiv_org_abs_2508_03470
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
wang, Dong
Yu, Junji
Shu, Honglin
Fu, Michael
Tantithamthavorn, Chakkrit
Kamei, Yasutaka
Chen, Junjie
Software Engineering
Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitations. Although some work has begun to explore LLMs' repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promise of LLM on multilingual vulnerability repair and the reasons behind LLM's failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs for multilingual vulnerability repair.
title On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
topic Software Engineering
url https://arxiv.org/abs/2508.03470