Salvato in:
Dettagli Bibliografici
Autori principali: Wright, Tanner, Chen, Adams, Rodríguez-Pérez, Gema
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2602.13400
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866910029773275136
author Wright, Tanner
Chen, Adams
Rodríguez-Pérez, Gema
author_facet Wright, Tanner
Chen, Adams
Rodríguez-Pérez, Gema
contents Understanding the causes of software defects is essential for reliable software maintenance and ecosystem stability. However, existing bug datasets do not distinguish between issues originating within a project from those caused by external dependencies or environmental factors. In this paper we present InEx-Bug, a manually annotated dataset of 377 GitHub issues from 103 NPM repositories, categorizing issues as Intrinsic (internal defect), Extrinsic (dependency/environment issue), Not-a-Bug, or Unknown. Beyond labels, the dataset includes rich temporal and behavioral metadata such as maintainer participation, code changes, and reopening patterns. Analyses show Intrinsic bugs resolve faster (median 8.9 vs 10.2 days), are close more often (92% vs 78%), and require code changes more frequently (57% vs 28%) compared to Extrinsic bugs. While Extrinsic bugs exhibit higher reopen rates (12% vs 4%) and delayed recurrence (median 157 vs 87 days). The dataset provides a foundation for further studying Intrinsic and Extrinsic defects in the NPM ecosystem.
format Preprint
id arxiv_https___arxiv_org_abs_2602_13400
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem
Wright, Tanner
Chen, Adams
Rodríguez-Pérez, Gema
Software Engineering
Programming Languages
Understanding the causes of software defects is essential for reliable software maintenance and ecosystem stability. However, existing bug datasets do not distinguish between issues originating within a project from those caused by external dependencies or environmental factors. In this paper we present InEx-Bug, a manually annotated dataset of 377 GitHub issues from 103 NPM repositories, categorizing issues as Intrinsic (internal defect), Extrinsic (dependency/environment issue), Not-a-Bug, or Unknown. Beyond labels, the dataset includes rich temporal and behavioral metadata such as maintainer participation, code changes, and reopening patterns. Analyses show Intrinsic bugs resolve faster (median 8.9 vs 10.2 days), are close more often (92% vs 78%), and require code changes more frequently (57% vs 28%) compared to Extrinsic bugs. While Extrinsic bugs exhibit higher reopen rates (12% vs 4%) and delayed recurrence (median 157 vs 87 days). The dataset provides a foundation for further studying Intrinsic and Extrinsic defects in the NPM ecosystem.
title InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem
topic Software Engineering
Programming Languages
url https://arxiv.org/abs/2602.13400