Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	De Rosa, Giuseppe, Liguori, Pietro
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Software Engineering
Online-Zugang:	https://arxiv.org/abs/2604.26667
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866909000806694912
author	De Rosa, Giuseppe Liguori, Pietro
author_facet	De Rosa, Giuseppe Liguori, Pietro
contents	Python's dynamic nature complicates testing and increases the possibility that some defects evade detection, so an effective fault prediction becomes essential. We examine whether post-release faults can be predicted using modern ML and DL. Using a balanced dataset of over 4,000 labeled faults with 83 product, process, statistical, and Python-specific metrics plus normalized code representations, we conduct cross-project experiments. LLMs and unsupervised models fail to distinguish residual from non-residual faults, while supervised metric-based models (RandomForest, XGBoost, CatBoost) perform far better, yielding a 0.85-0.9 recall and cutting false negatives by an order of magnitude. Process metrics, especially age, churn, and developer-activity, alongside class and file size, consistently prove most predictive. Notably, the Principal Component Analysis shows that metrics and code embeddings occupy distinct regions of the representation space, suggesting that they capture complementary rather than redundant information.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_26667
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems De Rosa, Giuseppe Liguori, Pietro Software Engineering Python's dynamic nature complicates testing and increases the possibility that some defects evade detection, so an effective fault prediction becomes essential. We examine whether post-release faults can be predicted using modern ML and DL. Using a balanced dataset of over 4,000 labeled faults with 83 product, process, statistical, and Python-specific metrics plus normalized code representations, we conduct cross-project experiments. LLMs and unsupervised models fail to distinguish residual from non-residual faults, while supervised metric-based models (RandomForest, XGBoost, CatBoost) perform far better, yielding a 0.85-0.9 recall and cutting false negatives by an order of magnitude. Process metrics, especially age, churn, and developer-activity, alongside class and file size, consistently prove most predictive. Notably, the Principal Component Analysis shows that metrics and code embeddings occupy distinct regions of the representation space, suggesting that they capture complementary rather than redundant information.
title	Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems
topic	Software Engineering
url	https://arxiv.org/abs/2604.26667

Ähnliche Einträge