Saved in:
Bibliographic Details
Main Authors: Montandon, João Eduardo, Silva, Luciana Lourdes, Politowski, Cristiano, Prates, Daniel, Bonifácio, Arthur de Brito, Boussaidi, Ghizlane El
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.05129
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912330158178304
author Montandon, João Eduardo
Silva, Luciana Lourdes
Politowski, Cristiano
Prates, Daniel
Bonifácio, Arthur de Brito
Boussaidi, Ghizlane El
author_facet Montandon, João Eduardo
Silva, Luciana Lourdes
Politowski, Cristiano
Prates, Daniel
Bonifácio, Arthur de Brito
Boussaidi, Ghizlane El
contents Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks -- Scikit Learn, NumPy, and Pandas -- studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function's behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.
format Preprint
id arxiv_https___arxiv_org_abs_2408_05129
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries
Montandon, João Eduardo
Silva, Luciana Lourdes
Politowski, Cristiano
Prates, Daniel
Bonifácio, Arthur de Brito
Boussaidi, Ghizlane El
Software Engineering
Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks -- Scikit Learn, NumPy, and Pandas -- studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function's behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.
title Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries
topic Software Engineering
url https://arxiv.org/abs/2408.05129