Saved in:
Bibliographic Details
Main Author: Auch, Maximilian
Format: Recurso digital
Language:
Published: Zenodo 2025
Subjects:
Online Access:https://doi.org/10.5281/zenodo.17846204
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <h2><strong>MATILDA Design Decision Dataset (2010–2023)</strong></h2> <p>This dataset comprises historical development data and design decisions from the Java ecosystem, extracted from publicly available GitHub repositories. The focus lies on the evolution of software dependencies and the identification of technology migrations based on build configuration files (Maven pom.xml).</p> <p><strong>Data Basis and Scope</strong><br>The dataset covers the period from 2010 to 2023 and is based on an analysis of approximately 180,000 software projects. The data basis includes:</p> <ul> <li>3.1 million revisions with complete version history.</li> <li>25.7 million analyzed software components, classified using RNN-based methods.</li> <li>114,202 software projects from which valid design decisions could be extracted.</li> </ul> <p><strong>Extracted Decisions</strong><br>By comparing revision states, changes in the technology stack were identified and categorized:</p> <ul> <li>1.55 million design decisions in total (adding or removing libraries).</li> <li>136,472 migration decisions (8.8%), where a technology was replaced by a functional alternative.</li> <li>74 library categories, with a special focus on databases, application servers, UI frameworks, and messaging systems.</li> </ul> <p><strong>Structure and Formats</strong><br>The data is available in three processing stages:</p> <ul> <li>Raw Data (MongoDB): Complete history including branches and README files.</li> <li>Relational Data (PostgreSQL): Normalized design and migration decisions.</li> <li>Graph Data (Neo4j): Modeling of 2.5 million revision nodes and their relationships to 140 technologies for analyzing migration paths.</li> </ul> <p><strong>Application Areas</strong><br>The dataset is suitable for empirical software engineering research, particularly for analyzing technology trends, investigating library migrations, and training recommendation systems in the field of software architecture.</p>