Utilities for downloading OSV data, enriching vulnerabilities with a recidivism metric, and cloning referenced source repositories locally.
Copy the default config and edit your local paths:
cp recidivism.default.ini recidivism.iniBoth scripts read settings from recidivism.ini. If that file is missing, the
scripts print guidance and fall back to recidivism.default.ini.
python scripts/enrich_osv_recidivism.py \
--output data/osv_recidivism.jsonlThis script:
- downloads the OSV dump (
OSV-all.zipby default), - extracts all vulnerabilities,
- computes a recidivism metric using CWE recurrence and repository/fix history,
- appends recidivism details to each vulnerability and writes JSONL output.
python scripts/clone_osv_repositories.py \
--osv-dir data/osv_dump \
--target-dir data/repos \
--update-existingThis script scans OSV vulnerabilities for GitHub source references and
clones/updates local copies for research workflows (organized as
<target-dir>/<owner>/<repo>).
python scripts/cleanup_empty_repos.py --path data/repos --dry-runThe script cleanup_empty_repos.py deletes empty repositories that were created in the cloning process. These repos either no longer exist or were privated. This command runs a dry-run without permanent changes.
python scripts/cleanup_empty_repos.py --path data/repos --yesThis command runs the script and removes empty directories without user prompts.
python scripts/generate_recidivism_scores.py \
--input data/osv_recidivism.jsonl \
--output-dir data/scoresThis script:
- scans
osv_recidivism.jsonlfor all vulnerabilities, - calculates a recidivism score for each vulnerability based on CWE/repository recurrence,
- generates individual JSON files in
data/scores/<vulnerability_id>.jsoncontaining:- list of CWEs and repositories referenced in the vulnerability
- CWE and repository repeat counts
- raw recidivism score
- base and adjusted severity scores