Salvato in:
| Autori principali: | , , |
|---|---|
| Natura: | Recurso digital |
| Lingua: | |
| Pubblicazione: |
Zenodo
2026
|
| Accesso online: | https://doi.org/10.5281/zenodo.19355932 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866901600217333760 |
|---|---|
| author | Sayedsalehi, Ali Rigby, Peter Mierzwinski, Gregory |
| author_facet | Sayedsalehi, Ali Rigby, Peter Mierzwinski, Gregory |
| contents | <p>This repository is the replication package for the paper "Risk-Aware Batch Testing for Performance Regression Detection". It contains the complete artifact chain used in the paper: the JIT-Mozilla-Perf dataset, the data extraction pipeline, model fine-tuning and inference code for commit-level performance regression prediction, and the replay-based CI simulation framework used to evaluate batching strategies.</p> <p>The companion JIT-Mozilla-Perf dataset is archived separately on Zenodo at https://doi.org/10.5281/zenodo.18829344.</p> <p>This replication package can be found on GitHub:<br>https://github.com/Ali-Sayed-Salehi/jit-dp-llm/tree/zenodo-batch-perf</p> <p>The package supports reproduction of the paper’s full workflow:<br>(1) construction of the JIT-Mozilla-Perf dataset from Mozilla production data sources,<br>(2) fine-tuning of commit-level performance regression risk models,<br>(3) inference to generate chronological commit risk scores, and<br>(4) replay-based simulation of risk-aware batching strategies.</p> <p>The core dataset used by the paper is stored under datasets/mozilla_perf/. Its main modeling artifact, perf_llm_struc_no_fw_2_6_18.jsonl, contains 11,384 chronologically ordered commit instances derived from Mozilla performance alerts, Bugzilla performance bugs, and Mercurial Autoland history. </p> <p>The repository also includes the simulation metadata needed to model realistic performance testing behavior, including failing performance signatures, signature groups, per-revision coverage, and job-duration estimates. The files under datasets/mozilla_perf/ in this replication package correspond to the same paper dataset family and are the artifacts consumed by the training and simulation code documented here.</p> <p>The replication package includes prediction artifacts that can be used directly as simulator inputs, including:<br>- analysis/batch_testing/final_test_results_perf_codebert_eval.json<br>- analysis/batch_testing/final_test_results_perf_codebert_final_test.json</p> <p>These artifacts allow users to rerun the main Optuna-based batch-testing experiments without retraining models.</p> <p>The paper evaluates ModernBERT, CodeBERT, and LLaMA 3.1 8B as performance regression risk predictors, then uses their risk scores to drive batching strategies such as Time-Window Batching (TWB), Fixed-Size Batching (FSB), Risk-Adaptive Stream Batching (RASB), Risk-Aged Priority Batching (RAPB), and Risk-Adaptive Trigger Batching (RATB). The main reported result is that RAPB-la provides the strongest overall balance between cost and timeliness, reducing total tests by 32.4%, reducing maximum time-to-culprit by 26.2%, and yielding an estimated annual infrastructure savings of about $491K relative to the production-inspired baseline.</p> <p>The paper-relevant repository paths are:<br>- datasets/mozilla_perf/<br>- data_extraction/treeherder/<br>- data_extraction/bugzilla/<br>- data_extraction/mercurial/<br>- data_extraction/data_preparation.py<br>- llama/<br>- analysis/batch_testing/<br>- slurm_scripts/speed/<br>- docker/Dockerfile.llama-train-environment</p> <p>Detailed reproduction instructions are provided in the repository README. The fastest rerun path is to use the packaged CodeBERT prediction JSON files as inputs to analysis/batch_testing/simulation.py. The full regeneration path rebuilds the dataset, fine-tunes the risk predictors, runs inference on the eval and test splits, and then reruns the simulator.</p> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_19355932 |
| institution | Zenodo |
| language | |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Replication Package for "Risk-Aware Batch Testing for Performance Regression Detection" Sayedsalehi, Ali Rigby, Peter Mierzwinski, Gregory <p>This repository is the replication package for the paper "Risk-Aware Batch Testing for Performance Regression Detection". It contains the complete artifact chain used in the paper: the JIT-Mozilla-Perf dataset, the data extraction pipeline, model fine-tuning and inference code for commit-level performance regression prediction, and the replay-based CI simulation framework used to evaluate batching strategies.</p> <p>The companion JIT-Mozilla-Perf dataset is archived separately on Zenodo at https://doi.org/10.5281/zenodo.18829344.</p> <p>This replication package can be found on GitHub:<br>https://github.com/Ali-Sayed-Salehi/jit-dp-llm/tree/zenodo-batch-perf</p> <p>The package supports reproduction of the paper’s full workflow:<br>(1) construction of the JIT-Mozilla-Perf dataset from Mozilla production data sources,<br>(2) fine-tuning of commit-level performance regression risk models,<br>(3) inference to generate chronological commit risk scores, and<br>(4) replay-based simulation of risk-aware batching strategies.</p> <p>The core dataset used by the paper is stored under datasets/mozilla_perf/. Its main modeling artifact, perf_llm_struc_no_fw_2_6_18.jsonl, contains 11,384 chronologically ordered commit instances derived from Mozilla performance alerts, Bugzilla performance bugs, and Mercurial Autoland history. </p> <p>The repository also includes the simulation metadata needed to model realistic performance testing behavior, including failing performance signatures, signature groups, per-revision coverage, and job-duration estimates. The files under datasets/mozilla_perf/ in this replication package correspond to the same paper dataset family and are the artifacts consumed by the training and simulation code documented here.</p> <p>The replication package includes prediction artifacts that can be used directly as simulator inputs, including:<br>- analysis/batch_testing/final_test_results_perf_codebert_eval.json<br>- analysis/batch_testing/final_test_results_perf_codebert_final_test.json</p> <p>These artifacts allow users to rerun the main Optuna-based batch-testing experiments without retraining models.</p> <p>The paper evaluates ModernBERT, CodeBERT, and LLaMA 3.1 8B as performance regression risk predictors, then uses their risk scores to drive batching strategies such as Time-Window Batching (TWB), Fixed-Size Batching (FSB), Risk-Adaptive Stream Batching (RASB), Risk-Aged Priority Batching (RAPB), and Risk-Adaptive Trigger Batching (RATB). The main reported result is that RAPB-la provides the strongest overall balance between cost and timeliness, reducing total tests by 32.4%, reducing maximum time-to-culprit by 26.2%, and yielding an estimated annual infrastructure savings of about $491K relative to the production-inspired baseline.</p> <p>The paper-relevant repository paths are:<br>- datasets/mozilla_perf/<br>- data_extraction/treeherder/<br>- data_extraction/bugzilla/<br>- data_extraction/mercurial/<br>- data_extraction/data_preparation.py<br>- llama/<br>- analysis/batch_testing/<br>- slurm_scripts/speed/<br>- docker/Dockerfile.llama-train-environment</p> <p>Detailed reproduction instructions are provided in the repository README. The fastest rerun path is to use the packaged CodeBERT prediction JSON files as inputs to analysis/batch_testing/simulation.py. The full regeneration path rebuilds the dataset, fine-tunes the risk predictors, runs inference on the eval and test splits, and then reruns the simulator.</p> |
| title | Replication Package for "Risk-Aware Batch Testing for Performance Regression Detection" |
| url | https://doi.org/10.5281/zenodo.19355932 |