Skip to content

feat(cli): native regression-baseline sync/replay/mint verbs#724

Open
lewisjared wants to merge 10 commits into
mainfrom
feat/regression-cli
Open

feat(cli): native regression-baseline sync/replay/mint verbs#724
lewisjared wants to merge 10 commits into
mainfrom
feat/regression-cli

Conversation

@lewisjared

@lewisjared lewisjared commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Description

Adds the CLI workflow for RFC 0005 native regression baselines, building on the native-baseline primitives #720.

New ref test-cases verbs:

  • mint — runs each test case, stores its native snapshot in the writable store, and authors the committed manifest's native block (CI-only; needs store credentials).
  • replay — materialises the committed native blobs from the read store, re-runs build_execution_result, and asserts the regenerated bundle matches the in-repo copy via the tolerant comparator.
  • sync — fetches native blobs referenced by committed manifests into the local store cache (idempotent).

run now captures via capture_execution and refreshes only the committed block of manifest.json, leaving the native block mint-owned. The example diagnostic registers its NetCDF output so it has a native artefact to mint, and a roundtrip integration test exercises the capture -> mint -> replay loop end to end.

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

Introduce `climate_ref_core.paths.safe_path`, a single lexical +
containment guard for joining an untrusted relative path onto a trusted
base directory. It rejects empty/absolute paths, `..` components and NUL
bytes, and (when a base is given) confirms the resolved target still
lives under that base.

Replace the ad-hoc `_validate_path_segment` (fragment.py) and
`_validate_path_containment` (reingest.py) helpers with it, and apply it
when materialising native blobs and loading manifests so a hand-edited or
hostile manifest cannot escape the destination directory. Validate
sha256 digests as 64-char lowercase hex before they are used to build a
store or native path.
Add three `ref test-cases` verbs for the native-baseline workflow:

- `mint`: run each case, store its native snapshot in the writable store,
  and author the committed manifest's `native` block (CI, needs creds).
- `replay`: materialise the committed native blobs from the read store,
  re-run `build_execution_result`, and assert the regenerated bundle
  matches the in-repo copy via the tolerant comparator.
- `sync`: fetch native blobs referenced by committed manifests into the
  local store cache (idempotent).

`run` now captures via `capture_execution` and refreshes only the
committed block of `manifest.json`, leaving the native block mint-owned.
The example diagnostic registers its NetCDF output so it has a native
artefact to mint, and a roundtrip integration test exercises the
capture/mint/replay loop end to end.
Consolidating path-safety into safe_path dropped the separator check
that _validate_path_segment enforced for provider and diagnostic slugs.
A slug containing '/' passed the lexical layer (only '..', absolute and
NUL are rejected) and could restructure the output fragment tree.

Add a single_segment option to safe_path that rejects '/' and '\'
separators and a bare '.', and apply it at the slug call sites in
assign_execution_fragment.
* origin/main:
  chore: cleanup
  docs: add changelog entry for #723
  test: silence deprecation warnings and speed up the climate-ref suite
@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.56944% with 79 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ages/climate-ref/src/climate_ref/cli/test_cases.py 68.03% 60 Missing and 18 partials ⚠️
...s/climate-ref-core/src/climate_ref_core/testing.py 66.66% 1 Missing ⚠️
Flag Coverage Δ
core 92.49% <72.12%> (-0.80%) ⬇️
providers 91.80% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ges/climate-ref-core/src/climate_ref_core/paths.py 100.00% <100.00%> (ø)
...f-core/src/climate_ref_core/regression/__init__.py 100.00% <ø> (ø)
...ef-core/src/climate_ref_core/regression/capture.py 94.11% <100.00%> (+0.17%) ⬆️
...f-core/src/climate_ref_core/regression/manifest.py 100.00% <100.00%> (ø)
...-ref-core/src/climate_ref_core/regression/store.py 97.40% <100.00%> (+0.03%) ⬆️
...ate-ref-example/src/climate_ref_example/example.py 90.00% <100.00%> (+0.20%) ⬆️
...ckages/climate-ref/src/climate_ref/cli/__init__.py 96.96% <ø> (+2.02%) ⬆️
...s/climate-ref/src/climate_ref/executor/fragment.py 86.15% <100.00%> (-0.81%) ⬇️
...s/climate-ref/src/climate_ref/executor/reingest.py 93.89% <100.00%> (-0.19%) ⬇️
...s/climate-ref-core/src/climate_ref_core/testing.py 89.47% <66.66%> (-0.29%) ⬇️
... and 1 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The test-cases verbs (fetch, run, sync, replay, mint) operate only on
test artifacts -- the filesystem, the native store, and committed
manifests -- and run diagnostics through TestCaseRunner, which never
persists executions to the database. Like the already-listed `list`,
they only touch the database to build the provider registry.

Add them to the read-only command registry so they no longer copy the
SQLite file and churn the backups directory on every invocation when
migrations are pending. Add a test covering each verb plus negative
cases for data-modifying commands.
@lewisjared lewisjared force-pushed the feat/regression-cli branch from 38415b2 to a1170a0 Compare June 13, 2026 23:43
…eplay

Treat the byte-exact committed-bundle integrity check as advisory during
replay instead of a hard failure: when digests differ, warn and let the
tolerant bundle comparison decide whether the baseline is equivalent.
Replay and mint now report native/committed file counts, and report
"reconciled" vs "matched" so the byte-level warning is explained.

Drive the bundle comparison from the shared COMMITTED_BUNDLE_FILES
constant (now exported from climate_ref_core.regression) rather than a
hardcoded tuple, and sharpen integrity mismatch messages to include the
on-disk path and both digests.

Regenerate the example global-mean-timeseries native baselines so the
manifests carry native entries.
Three fixes to make the branch green:

- replay: import build_native_store from climate_ref_core.regression.store
  (matching mint/sync) so the offline roundtrip test's mock applies; the
  package re-export escaped the patch and reached the real remote store.

- example default: revert the native migration of
  annual_mean_global_mean_timeseries.nc back to a committed regression
  artifact; the offline RegressionValidator copies the committed bundle and
  does not materialise native files.

- example test: skip test_validate_test_case_regression for any case whose
  committed regression dir ships no *.nc, since offline CMEC-bundle
  validation must open it. Self-heals once a baseline is committed (cmip7).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant