feat(cli): native regression-baseline sync/replay/mint verbs#724
Open
lewisjared wants to merge 10 commits into
Open
feat(cli): native regression-baseline sync/replay/mint verbs#724lewisjared wants to merge 10 commits into
lewisjared wants to merge 10 commits into
Conversation
Introduce `climate_ref_core.paths.safe_path`, a single lexical + containment guard for joining an untrusted relative path onto a trusted base directory. It rejects empty/absolute paths, `..` components and NUL bytes, and (when a base is given) confirms the resolved target still lives under that base. Replace the ad-hoc `_validate_path_segment` (fragment.py) and `_validate_path_containment` (reingest.py) helpers with it, and apply it when materialising native blobs and loading manifests so a hand-edited or hostile manifest cannot escape the destination directory. Validate sha256 digests as 64-char lowercase hex before they are used to build a store or native path.
Add three `ref test-cases` verbs for the native-baseline workflow: - `mint`: run each case, store its native snapshot in the writable store, and author the committed manifest's `native` block (CI, needs creds). - `replay`: materialise the committed native blobs from the read store, re-run `build_execution_result`, and assert the regenerated bundle matches the in-repo copy via the tolerant comparator. - `sync`: fetch native blobs referenced by committed manifests into the local store cache (idempotent). `run` now captures via `capture_execution` and refreshes only the committed block of `manifest.json`, leaving the native block mint-owned. The example diagnostic registers its NetCDF output so it has a native artefact to mint, and a roundtrip integration test exercises the capture/mint/replay loop end to end.
Consolidating path-safety into safe_path dropped the separator check that _validate_path_segment enforced for provider and diagnostic slugs. A slug containing '/' passed the lexical layer (only '..', absolute and NUL are rejected) and could restructure the output fragment tree. Add a single_segment option to safe_path that rejects '/' and '\' separators and a bare '.', and apply it at the slug call sites in assign_execution_fragment.
* origin/main: chore: cleanup docs: add changelog entry for #723 test: silence deprecation warnings and speed up the climate-ref suite
The test-cases verbs (fetch, run, sync, replay, mint) operate only on test artifacts -- the filesystem, the native store, and committed manifests -- and run diagnostics through TestCaseRunner, which never persists executions to the database. Like the already-listed `list`, they only touch the database to build the provider registry. Add them to the read-only command registry so they no longer copy the SQLite file and churn the backups directory on every invocation when migrations are pending. Add a test covering each verb plus negative cases for data-modifying commands.
38415b2 to
a1170a0
Compare
…eplay Treat the byte-exact committed-bundle integrity check as advisory during replay instead of a hard failure: when digests differ, warn and let the tolerant bundle comparison decide whether the baseline is equivalent. Replay and mint now report native/committed file counts, and report "reconciled" vs "matched" so the byte-level warning is explained. Drive the bundle comparison from the shared COMMITTED_BUNDLE_FILES constant (now exported from climate_ref_core.regression) rather than a hardcoded tuple, and sharpen integrity mismatch messages to include the on-disk path and both digests. Regenerate the example global-mean-timeseries native baselines so the manifests carry native entries.
Three fixes to make the branch green: - replay: import build_native_store from climate_ref_core.regression.store (matching mint/sync) so the offline roundtrip test's mock applies; the package re-export escaped the patch and reached the real remote store. - example default: revert the native migration of annual_mean_global_mean_timeseries.nc back to a committed regression artifact; the offline RegressionValidator copies the committed bundle and does not materialise native files. - example test: skip test_validate_test_case_regression for any case whose committed regression dir ships no *.nc, since offline CMEC-bundle validation must open it. Self-heals once a baseline is committed (cmip7).
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds the CLI workflow for RFC 0005 native regression baselines, building on the native-baseline primitives #720.
New
ref test-casesverbs:mint— runs each test case, stores its native snapshot in the writable store, and authors the committed manifest'snativeblock (CI-only; needs store credentials).replay— materialises the committed native blobs from the read store, re-runsbuild_execution_result, and asserts the regenerated bundle matches the in-repo copy via the tolerant comparator.sync— fetches native blobs referenced by committed manifests into the local store cache (idempotent).runnow captures viacapture_executionand refreshes only the committed block ofmanifest.json, leaving the native block mint-owned. The example diagnostic registers its NetCDF output so it has a native artefact to mint, and a roundtrip integration test exercises the capture -> mint -> replay loop end to end.Checklist
Please confirm that this pull request has done the following:
changelog/