Sparad:
| Huvudupphovsman: | |
|---|---|
| Materialtyp: | Recurso digital |
| Språk: | |
| Publicerad: |
Zenodo
2026
|
| Länkar: | https://doi.org/10.5281/zenodo.19349629 |
| Taggar: |
Lägg till en tagg
Inga taggar, Lägg till första taggen!
|
Innehållsförteckning:
- <div dir="auto"> <h1>Automated Process ASE</h1> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#automated-process-ase"></a></div> <p dir="auto">This directory is the unified automation entry point for the four measurement modules, allowing the full workflow to be executed with a single command.</p> <div dir="auto"> <h2>Modules</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#modules"></a></div> <ul> <li><code>Updatechk</code></li> <li><code>Uniqueness</code></li> <li><code>Vulnerability</code></li> <li><code>Maliciousness</code></li> </ul> <p dir="auto">The workflow is currently orchestrated by <code>run_pipeline.py</code>, and each module writes its results to its own <code>res</code> directory.</p> <div dir="auto"> <h2>Directory Structure</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#directory-structure"></a></div> <ul> <li><code>run_pipeline.py</code>: main entry point responsible for argument parsing, module scheduling, and manifest generation</li> <li><code>pipeline_config.py</code>: default input paths and output directory layout</li> <li><code>Updatechk/res</code></li> <li><code>Uniqueness/link_dup/res</code></li> <li><code>Vulnerability/res</code></li> <li><code>Maliciousness/res</code></li> <li><code>manifests</code>: pipeline manifest files generated for each run</li> </ul> <div dir="auto"> <h2>What <code>manifests</code> Is For</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#what-manifests-is-for"></a></div> <p dir="auto"><code>manifests/</code> stores a traceable snapshot of each pipeline run. File names use the following format:</p> <ul> <li><code>manifests/<run_id>_pipeline_manifest.json</code></li> </ul> <p dir="auto">Each manifest includes:</p> <ul> <li>input arguments for the run (<code>dataset_root</code>, <code>metadata_yaml</code>, <code>workspace_root</code>)</li> <li>a snapshot of the resolved paths for the four modules and output directories</li> <li>the status of each module (<code>completed</code> / <code>skipped</code>)</li> <li>key output paths for each module</li> <li>execution script lists and runtime information for some modules</li> </ul> <p dir="auto">Use cases:</p> <ul> <li>reproducing experiments</li> <li>investigating why a module produced no output</li> <li>comparing differences between runs</li> </ul> <div dir="auto"> <h2>Running</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#running"></a></div> <p dir="auto">From the <code>Automated-process-ase</code> directory:</p> <div dir="auto"> <pre>python run_pipeline.py --run-id ase_full_20260326 --clean</pre> <div> </div> </div> <div dir="auto"> <h3>Common Arguments</h3> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#common-arguments"></a></div> <ul> <li><code>--dataset-root</code>: root directory of the repositories to be analyzed</li> <li><code>--metadata-yaml</code>: marketplace metadata file</li> <li><code>--workspace-root</code>: root output directory</li> <li><code>--run-id</code>: run ID for the current execution, used to name the manifest</li> <li><code>--max-projects</code>: process only the first N projects for quick testing</li> <li><code>--clean</code>: clear each module's <code>res</code> directory before running</li> <li><code>--verbose</code>: print full logs from child scripts</li> <li><code>--skip-updatechk</code></li> <li><code>--skip-uniqueness</code></li> <li><code>--skip-vulnerability</code></li> <li><code>--skip-maliciousness</code></li> <li><code>--uniqueness-run-featured</code>: run the uniqueness featured collection script first</li> </ul> <div dir="auto"> <h2>Current Execution Order</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#current-execution-order"></a></div> <p dir="auto">Default order:</p> <ol> <li><code>Updatechk</code></li> <li><code>Uniqueness</code></li> <li><code>Vulnerability</code></li> <li><code>Maliciousness</code></li> </ol> <p dir="auto">Any module can be skipped with the corresponding <code>--skip-*</code> flag.</p> <div dir="auto"> <h2>Output</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#output"></a></div> <p dir="auto">When the pipeline finishes, it prints:</p> <ul> <li><code>pipeline_manifest</code></li> <li>the root <code>res</code> directory for each module</li> </ul> <p dir="auto">For example:</p> <ul> <li><code>Updatechk/res/api_check_res.csv</code></li> <li><code>Uniqueness/link_dup/res/summary_duplicate_links.csv</code></li> <li><code>Vulnerability/res/validated_res.csv</code></li> <li><code>Maliciousness/res/tool_poisoning_res.csv</code></li> </ul> <div dir="auto"> <h2>Notes</h2> <a href="https://github.com/UteLar/202505-Measurement/tree/main/Automated-process-ase#notes"></a></div> <ul> <li>The default logging mode hides child-script stdout to reduce terminal noise, while keeping stderr output such as progress bars.</li> </ul>