Saved in:
Bibliographic Details
Main Authors: Liu, Wei, Peng, Chao, Gao, Pengfei, Liu, Aofan, Zhang, Wei, Zhao, Haiyan, Jin, Zhi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.22469
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908734357241856
author Liu, Wei
Peng, Chao
Gao, Pengfei
Liu, Aofan
Zhang, Wei
Zhao, Haiyan
Jin, Zhi
author_facet Liu, Wei
Peng, Chao
Gao, Pengfei
Liu, Aofan
Zhang, Wei
Zhao, Haiyan
Jin, Zhi
contents The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.
format Preprint
id arxiv_https___arxiv_org_abs_2512_22469
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle GraphLocator: Graph-guided Causal Reasoning for Issue Localization
Liu, Wei
Peng, Chao
Gao, Pengfei
Liu, Aofan
Zhang, Wei
Zhao, Haiyan
Jin, Zhi
Software Engineering
The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.
title GraphLocator: Graph-guided Causal Reasoning for Issue Localization
topic Software Engineering
url https://arxiv.org/abs/2512.22469