Shranjeno v:
Bibliografske podrobnosti
Glavni avtor: XU, Shaopeng
Format: Recurso digital
Jezik:
Izdano: Zenodo 2024
Teme:
Online dostop:https://doi.org/10.5281/zenodo.14536733
Oznake: Označite
Brez oznak, prvi označite!
_version_ 1866902164098514944
author XU, Shaopeng
author_facet XU, Shaopeng
contents <p>Bug datasets play a vital role in advancing software engineering tasks, including bug detection, fault localization, and automated program repair. These datasets enable the development of more accurate algorithms, facilitate efficient fault identification, and drive the creation of reliable automated repair tools. However, the manual collection and curation of such data are labor-intensive and prone to inconsistency, which limits scalability and reliability. Current datasets often fail to provide detailed and accurate information, particularly regarding bug types, descriptions, and classifications, reducing their utility in diverse research and practical applications. To address these challenges, we introduce BugCatcher, a comprehensive approach for constructing large-scale, high-quality bug datasets. BugCatcher begins by enhancing PR-Issue linking mechanisms, extending data collection to 12 programming languages over a decade, and ensuring accurate linkage between pull requests and issues. It employs a two-stage filtering process, BugCurator, to refine data quality, and utilizes large language models with Zero-shot Chain-of-Thought prompting to generate precise bug types and detailed descriptions. Furthermore, BugCatcher incorporates a robust classification framework, fine-tuning models for improved categorization. The resulting dataset, BugCatcher-Data, includes 243,265 bug-fix entries with comprehensive fields such as code diffs, bug locations, detailed descriptions, and classifications, serving as a substantial resource for advancing software engineering research and practices.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_14536733
institution Zenodo
language
publishDate 2024
publisher Zenodo
record_format zenodo
spellingShingle BugCatcher-Data
XU, Shaopeng
bug dataset
<p>Bug datasets play a vital role in advancing software engineering tasks, including bug detection, fault localization, and automated program repair. These datasets enable the development of more accurate algorithms, facilitate efficient fault identification, and drive the creation of reliable automated repair tools. However, the manual collection and curation of such data are labor-intensive and prone to inconsistency, which limits scalability and reliability. Current datasets often fail to provide detailed and accurate information, particularly regarding bug types, descriptions, and classifications, reducing their utility in diverse research and practical applications. To address these challenges, we introduce BugCatcher, a comprehensive approach for constructing large-scale, high-quality bug datasets. BugCatcher begins by enhancing PR-Issue linking mechanisms, extending data collection to 12 programming languages over a decade, and ensuring accurate linkage between pull requests and issues. It employs a two-stage filtering process, BugCurator, to refine data quality, and utilizes large language models with Zero-shot Chain-of-Thought prompting to generate precise bug types and detailed descriptions. Furthermore, BugCatcher incorporates a robust classification framework, fine-tuning models for improved categorization. The resulting dataset, BugCatcher-Data, includes 243,265 bug-fix entries with comprehensive fields such as code diffs, bug locations, detailed descriptions, and classifications, serving as a substantial resource for advancing software engineering research and practices.</p>
title BugCatcher-Data
topic bug dataset
url https://doi.org/10.5281/zenodo.14536733