Salvato in:
Dettagli Bibliografici
Autori principali: Sayedsalehi, Ali, Rigby, Peter, Mierzwinski, Gregory
Natura: Recurso digital
Lingua:
Pubblicazione: Zenodo 2026
Accesso online:https://doi.org/10.5281/zenodo.19355932
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866901600217333760
author Sayedsalehi, Ali
Rigby, Peter
Mierzwinski, Gregory
author_facet Sayedsalehi, Ali
Rigby, Peter
Mierzwinski, Gregory
contents <p>This repository is the replication package for the paper "Risk-Aware Batch Testing for Performance Regression Detection". It contains the complete artifact chain used in the paper: the JIT-Mozilla-Perf dataset, the data extraction pipeline, model fine-tuning and inference code for commit-level performance regression prediction, and the replay-based CI simulation framework used to evaluate batching strategies.</p> <p>The companion JIT-Mozilla-Perf dataset is archived separately on Zenodo at https://doi.org/10.5281/zenodo.18829344.</p> <p>This replication package can be found on GitHub:<br>https://github.com/Ali-Sayed-Salehi/jit-dp-llm/tree/zenodo-batch-perf</p> <p>The package supports reproduction of the paper’s full workflow:<br>(1) construction of the JIT-Mozilla-Perf dataset from Mozilla production data sources,<br>(2) fine-tuning of commit-level performance regression risk models,<br>(3) inference to generate chronological commit risk scores, and<br>(4) replay-based simulation of risk-aware batching strategies.</p> <p>The core dataset used by the paper is stored under datasets/mozilla_perf/. Its main modeling artifact, perf_llm_struc_no_fw_2_6_18.jsonl, contains 11,384 chronologically ordered commit instances derived from Mozilla performance alerts, Bugzilla performance bugs, and Mercurial Autoland history. </p> <p>The repository also includes the simulation metadata needed to model realistic performance testing behavior, including failing performance signatures, signature groups, per-revision coverage, and job-duration estimates. The files under datasets/mozilla_perf/ in this replication package correspond to the same paper dataset family and are the artifacts consumed by the training and simulation code documented here.</p> <p>The replication package includes prediction artifacts that can be used directly as simulator inputs, including:<br>- analysis/batch_testing/final_test_results_perf_codebert_eval.json<br>- analysis/batch_testing/final_test_results_perf_codebert_final_test.json</p> <p>These artifacts allow users to rerun the main Optuna-based batch-testing experiments without retraining models.</p> <p>The paper evaluates ModernBERT, CodeBERT, and LLaMA 3.1 8B as performance regression risk predictors, then uses their risk scores to drive batching strategies such as Time-Window Batching (TWB), Fixed-Size Batching (FSB), Risk-Adaptive Stream Batching (RASB), Risk-Aged Priority Batching (RAPB), and Risk-Adaptive Trigger Batching (RATB). The main reported result is that RAPB-la provides the strongest overall balance between cost and timeliness, reducing total tests by 32.4%, reducing maximum time-to-culprit by 26.2%, and yielding an estimated annual infrastructure savings of about $491K relative to the production-inspired baseline.</p> <p>The paper-relevant repository paths are:<br>- datasets/mozilla_perf/<br>- data_extraction/treeherder/<br>- data_extraction/bugzilla/<br>- data_extraction/mercurial/<br>- data_extraction/data_preparation.py<br>- llama/<br>- analysis/batch_testing/<br>- slurm_scripts/speed/<br>- docker/Dockerfile.llama-train-environment</p> <p>Detailed reproduction instructions are provided in the repository README. The fastest rerun path is to use the packaged CodeBERT prediction JSON files as inputs to analysis/batch_testing/simulation.py. The full regeneration path rebuilds the dataset, fine-tunes the risk predictors, runs inference on the eval and test splits, and then reruns the simulator.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19355932
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle Replication Package for "Risk-Aware Batch Testing for Performance Regression Detection"
Sayedsalehi, Ali
Rigby, Peter
Mierzwinski, Gregory
<p>This repository is the replication package for the paper "Risk-Aware Batch Testing for Performance Regression Detection". It contains the complete artifact chain used in the paper: the JIT-Mozilla-Perf dataset, the data extraction pipeline, model fine-tuning and inference code for commit-level performance regression prediction, and the replay-based CI simulation framework used to evaluate batching strategies.</p> <p>The companion JIT-Mozilla-Perf dataset is archived separately on Zenodo at https://doi.org/10.5281/zenodo.18829344.</p> <p>This replication package can be found on GitHub:<br>https://github.com/Ali-Sayed-Salehi/jit-dp-llm/tree/zenodo-batch-perf</p> <p>The package supports reproduction of the paper’s full workflow:<br>(1) construction of the JIT-Mozilla-Perf dataset from Mozilla production data sources,<br>(2) fine-tuning of commit-level performance regression risk models,<br>(3) inference to generate chronological commit risk scores, and<br>(4) replay-based simulation of risk-aware batching strategies.</p> <p>The core dataset used by the paper is stored under datasets/mozilla_perf/. Its main modeling artifact, perf_llm_struc_no_fw_2_6_18.jsonl, contains 11,384 chronologically ordered commit instances derived from Mozilla performance alerts, Bugzilla performance bugs, and Mercurial Autoland history. </p> <p>The repository also includes the simulation metadata needed to model realistic performance testing behavior, including failing performance signatures, signature groups, per-revision coverage, and job-duration estimates. The files under datasets/mozilla_perf/ in this replication package correspond to the same paper dataset family and are the artifacts consumed by the training and simulation code documented here.</p> <p>The replication package includes prediction artifacts that can be used directly as simulator inputs, including:<br>- analysis/batch_testing/final_test_results_perf_codebert_eval.json<br>- analysis/batch_testing/final_test_results_perf_codebert_final_test.json</p> <p>These artifacts allow users to rerun the main Optuna-based batch-testing experiments without retraining models.</p> <p>The paper evaluates ModernBERT, CodeBERT, and LLaMA 3.1 8B as performance regression risk predictors, then uses their risk scores to drive batching strategies such as Time-Window Batching (TWB), Fixed-Size Batching (FSB), Risk-Adaptive Stream Batching (RASB), Risk-Aged Priority Batching (RAPB), and Risk-Adaptive Trigger Batching (RATB). The main reported result is that RAPB-la provides the strongest overall balance between cost and timeliness, reducing total tests by 32.4%, reducing maximum time-to-culprit by 26.2%, and yielding an estimated annual infrastructure savings of about $491K relative to the production-inspired baseline.</p> <p>The paper-relevant repository paths are:<br>- datasets/mozilla_perf/<br>- data_extraction/treeherder/<br>- data_extraction/bugzilla/<br>- data_extraction/mercurial/<br>- data_extraction/data_preparation.py<br>- llama/<br>- analysis/batch_testing/<br>- slurm_scripts/speed/<br>- docker/Dockerfile.llama-train-environment</p> <p>Detailed reproduction instructions are provided in the repository README. The fastest rerun path is to use the packaged CodeBERT prediction JSON files as inputs to analysis/batch_testing/simulation.py. The full regeneration path rebuilds the dataset, fine-tunes the risk predictors, runs inference on the eval and test splits, and then reruns the simulator.</p>
title Replication Package for "Risk-Aware Batch Testing for Performance Regression Detection"
url https://doi.org/10.5281/zenodo.19355932