Saved in:
Bibliographic Details
Main Authors: Dervişoğlu, Havvanur, Halepmollası, Ruşen, Eyvaz, Elif
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.22752
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908427212554240
author Dervişoğlu, Havvanur
Halepmollası, Ruşen
Eyvaz, Elif
author_facet Dervişoğlu, Havvanur
Halepmollası, Ruşen
Eyvaz, Elif
contents Bug severity prediction is a critical task in software engineering as it enables more efficient resource allocation and prioritization in software maintenance. While AI-based analyses and models significantly require access to extensive datasets, industrial applications face challenges due to data-sharing constraints and the limited availability of labeled data. In this study, we investigate method-level bug severity prediction using source code metrics and Large Language Models (LLMs) with two widely used datasets. We compare the performance of models trained using centralized learning, federated learning, and synthetic data generation. Our experimental results, obtained using two widely recognized software defect datasets, indicate that models trained with federated learning and synthetic data achieve comparable results to centrally trained models without data sharing. Our finding highlights the potential of privacy-preserving approaches such as federated learning and synthetic data generation to enable effective bug severity prediction in industrial context where data sharing is a major challenge. The source code and dataset are available at our GitHub repository: https://github.com/drvshavva/EASE2025-Privacy-Preserving-Methods-for-Bug-Severity-Prediction.
format Preprint
id arxiv_https___arxiv_org_abs_2506_22752
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Privacy-Preserving Methods for Bug Severity Prediction
Dervişoğlu, Havvanur
Halepmollası, Ruşen
Eyvaz, Elif
Software Engineering
Bug severity prediction is a critical task in software engineering as it enables more efficient resource allocation and prioritization in software maintenance. While AI-based analyses and models significantly require access to extensive datasets, industrial applications face challenges due to data-sharing constraints and the limited availability of labeled data. In this study, we investigate method-level bug severity prediction using source code metrics and Large Language Models (LLMs) with two widely used datasets. We compare the performance of models trained using centralized learning, federated learning, and synthetic data generation. Our experimental results, obtained using two widely recognized software defect datasets, indicate that models trained with federated learning and synthetic data achieve comparable results to centrally trained models without data sharing. Our finding highlights the potential of privacy-preserving approaches such as federated learning and synthetic data generation to enable effective bug severity prediction in industrial context where data sharing is a major challenge. The source code and dataset are available at our GitHub repository: https://github.com/drvshavva/EASE2025-Privacy-Preserving-Methods-for-Bug-Severity-Prediction.
title Privacy-Preserving Methods for Bug Severity Prediction
topic Software Engineering
url https://arxiv.org/abs/2506.22752