Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Chen, Yuxiao, Wu, Jingzheng, Ling, Xiang, Li, Changjiang, Rui, Zhiqing, Luo, Tianyue, Wu, Yanjun
Format:	Preprint
Publié:	2024
Sujets:	Software Engineering
Accès en ligne:	https://arxiv.org/abs/2403.00448
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866916143722135552
author	Chen, Yuxiao Wu, Jingzheng Ling, Xiang Li, Changjiang Rui, Zhiqing Luo, Tianyue Wu, Yanjun
author_facet	Chen, Yuxiao Wu, Jingzheng Ling, Xiang Li, Changjiang Rui, Zhiqing Luo, Tianyue Wu, Yanjun
contents	In recent years, large language models (LLMs) have demonstrated substantial potential in addressing automatic program repair (APR) tasks. However, the current evaluation of these models for APR tasks focuses solely on the limited context of the single function or file where the bug is located, overlooking the valuable information in the repository-level context. This paper investigates the performance of popular LLMs in handling repository-level repair tasks. We introduce RepoBugs, a new benchmark comprising 124 typical repository-level bugs from open-source repositories. Preliminary experiments using GPT3.5 based on the function where the error is located, reveal that the repair rate on RepoBugs is only 22.58%, significantly diverging from the performance of GPT3.5 on function-level bugs in related studies. This underscores the importance of providing repository-level context when addressing bugs at this level. However, the repository-level context offered by the preliminary method often proves redundant and imprecise and easily exceeds the prompt length limit of LLMs. To solve the problem, we propose a simple and universal repository-level context extraction method (RLCE) designed to provide more precise context for repository-level code repair tasks. Evaluations of three mainstream LLMs show that RLCE significantly enhances the ability to repair repository-level bugs. The improvement reaches a maximum of 160% compared to the preliminary method. Additionally, we conduct a comprehensive analysis of the effectiveness and limitations of RLCE, along with the capacity of LLMs to address repository-level bugs, offering valuable insights for future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_00448
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? Chen, Yuxiao Wu, Jingzheng Ling, Xiang Li, Changjiang Rui, Zhiqing Luo, Tianyue Wu, Yanjun Software Engineering In recent years, large language models (LLMs) have demonstrated substantial potential in addressing automatic program repair (APR) tasks. However, the current evaluation of these models for APR tasks focuses solely on the limited context of the single function or file where the bug is located, overlooking the valuable information in the repository-level context. This paper investigates the performance of popular LLMs in handling repository-level repair tasks. We introduce RepoBugs, a new benchmark comprising 124 typical repository-level bugs from open-source repositories. Preliminary experiments using GPT3.5 based on the function where the error is located, reveal that the repair rate on RepoBugs is only 22.58%, significantly diverging from the performance of GPT3.5 on function-level bugs in related studies. This underscores the importance of providing repository-level context when addressing bugs at this level. However, the repository-level context offered by the preliminary method often proves redundant and imprecise and easily exceeds the prompt length limit of LLMs. To solve the problem, we propose a simple and universal repository-level context extraction method (RLCE) designed to provide more precise context for repository-level code repair tasks. Evaluations of three mainstream LLMs show that RLCE significantly enhances the ability to repair repository-level bugs. The improvement reaches a maximum of 160% compared to the preliminary method. Additionally, we conduct a comprehensive analysis of the effectiveness and limitations of RLCE, along with the capacity of LLMs to address repository-level bugs, offering valuable insights for future research.
title	When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done?
topic	Software Engineering
url	https://arxiv.org/abs/2403.00448

Documents similaires