Saved in:
Bibliographic Details
Main Authors: Aritra Sarkar, shaheenaliii, Varshini S
Format: Recurso digital
Language:
Published: Zenodo 2026
Online Access:https://doi.org/10.5281/zenodo.19642813
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901645531545600
author Aritra Sarkar
shaheenaliii
Varshini S
author_facet Aritra Sarkar
shaheenaliii
Varshini S
contents <h2>What's New</h2> <pre><code> </code></pre> <p>### Config-Driven Pipeline</p> <ul> <li>Centralized <code>config.yaml</code> replacing all hardcoded constants</li> <li>New <code>config_loader.py</code> for dynamic pipeline configuration</li> <li>YAML-based <code>datasets.yaml</code> schema validation for data integrity</li> </ul> <h3> Reproducibility & Publication</h3> <ul> <li>Added <code>REPRODUCIBILITY.md</code> with full reproduction steps</li> <li>Bundled SoftwareX paper (<code>paper.tex</code>, <code>paper.bib</code>)</li> <li>TikZ architecture diagram (<code>architecture.tex</code>)</li> </ul> <h3> CI & Infrastructure Fixes</h3> <ul> <li>Fixed flake8 F824 — removed unused <code>global</code> declarations in <code>state_mapping.py</code></li> <li>Fixed Docker build — switched from <code>openjdk-21</code> to <code>openjdk-17</code> (Bookworm compat)</li> <li>Fixed test failures — added LADAKH to canonical states, corrected merged UT mappings</li> <li>All 84 tests passing across Python 3.9 / 3.10 / 3.11</li> </ul> <h3> State Mapping</h3> <ul> <li>YAML-driven state name overrides (<code>state_config.yaml</code>)</li> <li>Added Ladakh as canonical state/UT</li> <li>Proper handling of merged Dadra & Nagar Haveli + Daman & Diu aliases</li> </ul>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19642813
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle aritra0309/hadoop-crime-project: Config-Driven Pipeline and Reproducibility Package
Aritra Sarkar
shaheenaliii
Varshini S
<h2>What's New</h2> <pre><code> </code></pre> <p>### Config-Driven Pipeline</p> <ul> <li>Centralized <code>config.yaml</code> replacing all hardcoded constants</li> <li>New <code>config_loader.py</code> for dynamic pipeline configuration</li> <li>YAML-based <code>datasets.yaml</code> schema validation for data integrity</li> </ul> <h3> Reproducibility & Publication</h3> <ul> <li>Added <code>REPRODUCIBILITY.md</code> with full reproduction steps</li> <li>Bundled SoftwareX paper (<code>paper.tex</code>, <code>paper.bib</code>)</li> <li>TikZ architecture diagram (<code>architecture.tex</code>)</li> </ul> <h3> CI & Infrastructure Fixes</h3> <ul> <li>Fixed flake8 F824 — removed unused <code>global</code> declarations in <code>state_mapping.py</code></li> <li>Fixed Docker build — switched from <code>openjdk-21</code> to <code>openjdk-17</code> (Bookworm compat)</li> <li>Fixed test failures — added LADAKH to canonical states, corrected merged UT mappings</li> <li>All 84 tests passing across Python 3.9 / 3.10 / 3.11</li> </ul> <h3> State Mapping</h3> <ul> <li>YAML-driven state name overrides (<code>state_config.yaml</code>)</li> <li>Added Ladakh as canonical state/UT</li> <li>Proper handling of merged Dadra & Nagar Haveli + Daman & Diu aliases</li> </ul>
title aritra0309/hadoop-crime-project: Config-Driven Pipeline and Reproducibility Package
url https://doi.org/10.5281/zenodo.19642813