feat(regression): CI coupling gate for test case baselines#727
Open
lewisjared wants to merge 3 commits into
Open
feat(regression): CI coupling gate for test case baselines#727lewisjared wants to merge 3 commits into
lewisjared wants to merge 3 commits into
Conversation
Add `ref test-cases ci-gate`, which decides how CI should verify each regression test case against the base branch: replay the cached native baseline, execute a full re-run when `test_case_version` is bumped, skip unchanged cases, or fail an unauthorised baseline change. The decision lives in the pure `climate_ref_core.regression.gate` (`decide_coupling`, `Action`, `GateDecision`, `paths_under`) so the full matrix is unit-testable offline; the CLI maps the git diff and on-disk state onto its arguments. Extract `Manifest.loads` from `Manifest.load` so the base-branch manifest parses from `git show` output. The gate fails closed: deleting a managed manifest, drifting the committed bundle from its manifest digests, or re-minting native blobs without a version bump are all caught rather than silently skipped.
Address review findings on the CI coupling gate, resolving five design decisions reached during review: - Gate emits REPLAY only when the head manifest has native blobs to replay (seeding / native-changed / extraction-changed). An empty native set is a permanent valid state, so a de-mint (native removed, committed unchanged) warns and SKIPs rather than failing. The native axis is documented as not fail-closed. - Couple each baseline to its inputs: new optional `Manifest.catalog_hash`, checked by the gate via `catalog_integrity_ok`. A `catalog.yaml` change without regenerating the baseline now FAILs instead of silently skipping. The `.catalog_hash` sidecar is retired in favour of the manifest, which is the single coupling record; run/mint populate it via a shared helper. - Comparator no longer conflates `bool` with int/float, and the default absolute tolerance is a small non-zero placeholder so values at zero are not held to bit-exactness. - Broaden extraction-change detection to the core surfaces behind build_execution_result (pycmec, output_files, diagnostics), reject unknown native-store URL schemes instead of coercing them to local paths, reject a bare '.' in safe_path, add ci-gate to the read-only command set, cache the pooch manager, and memoise per-provider source-root resolution in the gate. Adds a background doc (regression baselines + CI coupling gate) with lifecycle and gate-decision mermaid diagrams, and unit coverage for every new branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds the CI coupling gate for RFC 0005 regression baselines (PR-3), stacked on #724.
For each test case the gate decides — purely from what changed relative to the base branch — how CI should verify the baseline:
skip,replay(cheap, anonymous, against cached native blobs),execute(full re-run whentest_case_versionis bumped), orfail(an unauthorised or unverifiable change).decide_couplingis a pure function;ref test-cases ci-gatemaps a PR's changed-file list onto its inputs.This branch also folds in the review-driven hardening:
Manifest.catalog_hash; acatalog.yamlchange without regenerating the baseline now FAILs the gate instead of silently passing. The.catalog_hashsidecar is retired in favour of the manifest as the single coupling record.boolwith int/float; non-zero defaultatolso values at zero aren't held to bit-exactness.build_execution_result; reject unknown native-store URL schemes; reject a bare.insafe_path; markci-gateread-only; cache the pooch manager; memoise per-provider source-root resolution.A background doc (
docs/background/regression-baselines.md) explains the two-layer baseline model, the lifecycle verbs, and the gate, with lifecycle and gate-decision mermaid diagrams.Checklist
Please confirm that this pull request has done the following:
changelog/