Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Maarleveld, Jesse, Guo, Jiapan, Feitosa, Daniel
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2507.18319
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866912499803095040
author Maarleveld, Jesse
Guo, Jiapan
Feitosa, Daniel
author_facet Maarleveld, Jesse
Guo, Jiapan
Feitosa, Daniel
contents Bug localisation, the study of developing methods to localise the files requiring changes to resolve bugs, has been researched for a long time to develop methods capable of saving developers' time. Recently, researchers are starting to consider issues outside of bugs. Nevertheless, most existing research into file localisation from issues focusses on bugs or uses other selection methods to ensure only certain types of issues are considered as part of the focus of the work. Our goal is to work on all issues at large, without any specific selection. In this work, we provide a data pipeline for the creation of issue file localisation datasets, capable of dealing with arbitrary branching and merging practices. We provide a baseline performance evaluation for the file localisation problem using traditional information retrieval approaches. Finally, we use statistical analysis to investigate the influence of biases known in the bug localisation community on our dataset. Our results show that methods designed using bug-specific heuristics perform poorly on general issue types, indicating a need for research into general purpose models. Furthermore, we find that there are small, but statistically significant differences in performance between different issue types. Finally, we find that the presence of identifiers have a small effect on performance for most issue types. Many results are project-dependent, encouraging the development of methods which can be tuned to project-specific characteristics.
format Preprint
id arxiv_https___arxiv_org_abs_2507_18319
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Gotta catch 'em all! Towards File Localisation from Issues at Large
Maarleveld, Jesse
Guo, Jiapan
Feitosa, Daniel
Software Engineering
Bug localisation, the study of developing methods to localise the files requiring changes to resolve bugs, has been researched for a long time to develop methods capable of saving developers' time. Recently, researchers are starting to consider issues outside of bugs. Nevertheless, most existing research into file localisation from issues focusses on bugs or uses other selection methods to ensure only certain types of issues are considered as part of the focus of the work. Our goal is to work on all issues at large, without any specific selection. In this work, we provide a data pipeline for the creation of issue file localisation datasets, capable of dealing with arbitrary branching and merging practices. We provide a baseline performance evaluation for the file localisation problem using traditional information retrieval approaches. Finally, we use statistical analysis to investigate the influence of biases known in the bug localisation community on our dataset. Our results show that methods designed using bug-specific heuristics perform poorly on general issue types, indicating a need for research into general purpose models. Furthermore, we find that there are small, but statistically significant differences in performance between different issue types. Finally, we find that the presence of identifiers have a small effect on performance for most issue types. Many results are project-dependent, encouraging the development of methods which can be tuned to project-specific characteristics.
title Gotta catch 'em all! Towards File Localisation from Issues at Large
topic Software Engineering
url https://arxiv.org/abs/2507.18319