Saved in:
Bibliographic Details
Main Authors: Xu, Hexiang, Liu, Hengyuan, Wu, Yonghao, Kang, Xiaolan, Chen, Xiang, Liu, Yong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.03421
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912745792733184
author Xu, Hexiang
Liu, Hengyuan
Wu, Yonghao
Kang, Xiaolan
Chen, Xiang
Liu, Yong
author_facet Xu, Hexiang
Liu, Hengyuan
Wu, Yonghao
Kang, Xiaolan
Chen, Xiang
Liu, Yong
contents Novice programmers often face challenges in fault localization due to their limited experience and understanding of programming syntax and logic. Traditional methods like Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) help identify faults but often lack the ability to understand code context, making them less effective for beginners. In recent years, Large Language Models (LLMs) have shown promise in overcoming these limitations by utilizing their ability to understand program syntax and semantics. LLM-based fault localization provides more accurate and context-aware results than traditional techniques. This study evaluates six closed-source and seven open-source LLMs using the Codeflaws, Condefects, and BugT datasets, with BugT being a newly constructed dataset specifically designed to mitigate data leakage concerns. Advanced models with reasoning capabilities, such as OpenAI o3 and DeepSeekR1, achieve superior accuracy with minimal reliance on prompt engineering. In contrast, models without reasoning capabilities, like GPT-4, require carefully designed prompts to maintain performance. While LLMs perform well in simple fault localization, their accuracy decreases as problem difficulty increases, though top models maintain robust performance in the BugT dataset. Over-reasoning is another challenge, where some models generate excessive explanations that hinder fault localization clarity. Additionally, the computational cost of deploying LLMs remains a significant barrier for real-time debugging. LLM's explanations demonstrate significant value for novice programmer assistance, with one-year experience participants consistently rating them highly. Our findings demonstrate the potential of LLMs to improve debugging efficiency while stressing the need for further refinement in their reasoning and computational efficiency for practical adoption.
format Preprint
id arxiv_https___arxiv_org_abs_2512_03421
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization
Xu, Hexiang
Liu, Hengyuan
Wu, Yonghao
Kang, Xiaolan
Chen, Xiang
Liu, Yong
Software Engineering
Novice programmers often face challenges in fault localization due to their limited experience and understanding of programming syntax and logic. Traditional methods like Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) help identify faults but often lack the ability to understand code context, making them less effective for beginners. In recent years, Large Language Models (LLMs) have shown promise in overcoming these limitations by utilizing their ability to understand program syntax and semantics. LLM-based fault localization provides more accurate and context-aware results than traditional techniques. This study evaluates six closed-source and seven open-source LLMs using the Codeflaws, Condefects, and BugT datasets, with BugT being a newly constructed dataset specifically designed to mitigate data leakage concerns. Advanced models with reasoning capabilities, such as OpenAI o3 and DeepSeekR1, achieve superior accuracy with minimal reliance on prompt engineering. In contrast, models without reasoning capabilities, like GPT-4, require carefully designed prompts to maintain performance. While LLMs perform well in simple fault localization, their accuracy decreases as problem difficulty increases, though top models maintain robust performance in the BugT dataset. Over-reasoning is another challenge, where some models generate excessive explanations that hinder fault localization clarity. Additionally, the computational cost of deploying LLMs remains a significant barrier for real-time debugging. LLM's explanations demonstrate significant value for novice programmer assistance, with one-year experience participants consistently rating them highly. Our findings demonstrate the potential of LLMs to improve debugging efficiency while stressing the need for further refinement in their reasoning and computational efficiency for practical adoption.
title Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization
topic Software Engineering
url https://arxiv.org/abs/2512.03421