feat(regression): CI coupling gate for test case baselines by lewisjared · Pull Request #727 · Climate-REF/climate-ref

lewisjared · 2026-06-15T11:05:46Z

Description

Adds the CI coupling gate for RFC 0005 regression baselines (PR-3), stacked on #724.

For each test case the gate decides — purely from what changed relative to the base branch — how CI should verify the baseline: skip, replay (cheap, anonymous, against cached native blobs), execute (full re-run when test_case_version is bumped), or fail (an unauthorised or unverifiable change). decide_coupling is a pure function; ref test-cases ci-gate maps a PR's changed-file list onto its inputs.

This branch also folds in the review-driven hardening:

Native is opt-in. REPLAY is selected only when the head manifest actually has native blobs (fork contributors can't mint, so an empty native set is a permanent valid state). A de-mint (native removed, committed unchanged) warns and SKIPs rather than failing.
Baselines are coupled to their inputs. New optional Manifest.catalog_hash; a catalog.yaml change without regenerating the baseline now FAILs the gate instead of silently passing. The .catalog_hash sidecar is retired in favour of the manifest as the single coupling record.
Fail-closed on the committed dimension: managed-manifest deletion, committed-bundle drift, and input-catalog drift all FAIL.
Comparator fixes: no longer conflates bool with int/float; non-zero default atol so values at zero aren't held to bit-exactness.
Robustness/cleanup: broaden extraction-change detection to the core surfaces behind build_execution_result; reject unknown native-store URL schemes; reject a bare . in safe_path; mark ci-gate read-only; cache the pooch manager; memoise per-provider source-root resolution.

A background doc (docs/background/regression-baselines.md) explains the two-layer baseline model, the lifecycle verbs, and the gate, with lifecycle and gate-decision mermaid diagrams.

Note: the actual CI workflow YAML, the R2 write backend, and the <RECIPE_RUN> sanitisation for real ESMValTool are deferred to later PRs by design.

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

Add `ref test-cases ci-gate`, which decides how CI should verify each regression test case against the base branch: replay the cached native baseline, execute a full re-run when `test_case_version` is bumped, skip unchanged cases, or fail an unauthorised baseline change. The decision lives in the pure `climate_ref_core.regression.gate` (`decide_coupling`, `Action`, `GateDecision`, `paths_under`) so the full matrix is unit-testable offline; the CLI maps the git diff and on-disk state onto its arguments. Extract `Manifest.loads` from `Manifest.load` so the base-branch manifest parses from `git show` output. The gate fails closed: deleting a managed manifest, drifting the committed bundle from its manifest digests, or re-minting native blobs without a version bump are all caught rather than silently skipped.

Address review findings on the CI coupling gate, resolving five design decisions reached during review: - Gate emits REPLAY only when the head manifest has native blobs to replay (seeding / native-changed / extraction-changed). An empty native set is a permanent valid state, so a de-mint (native removed, committed unchanged) warns and SKIPs rather than failing. The native axis is documented as not fail-closed. - Couple each baseline to its inputs: new optional `Manifest.catalog_hash`, checked by the gate via `catalog_integrity_ok`. A `catalog.yaml` change without regenerating the baseline now FAILs instead of silently skipping. The `.catalog_hash` sidecar is retired in favour of the manifest, which is the single coupling record; run/mint populate it via a shared helper. - Comparator no longer conflates `bool` with int/float, and the default absolute tolerance is a small non-zero placeholder so values at zero are not held to bit-exactness. - Broaden extraction-change detection to the core surfaces behind build_execution_result (pycmec, output_files, diagnostics), reject unknown native-store URL schemes instead of coercing them to local paths, reject a bare '.' in safe_path, add ci-gate to the read-only command set, cache the pooch manager, and memoise per-provider source-root resolution in the gate. Adds a background doc (regression baselines + CI coupling gate) with lifecycle and gate-decision mermaid diagrams, and unit coverage for every new branch.

codecov · 2026-06-15T11:09:27Z

Codecov Report

❌ Patch coverage is 94.49541% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ages/climate-ref/src/climate_ref/cli/test_cases.py	90.16%	9 Missing and 3 partials ⚠️

Flag	Coverage Δ
core	`92.57% <94.49%> (+0.08%)`	⬆️
providers	`91.80% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...ges/climate-ref-core/src/climate_ref_core/paths.py	`100.00% <100.00%> (ø)`
...f-core/src/climate_ref_core/regression/__init__.py	`100.00% <100.00%> (ø)`
...ef-core/src/climate_ref_core/regression/compare.py	`97.36% <100.00%> (+0.03%)`	⬆️
...e-ref-core/src/climate_ref_core/regression/gate.py	`100.00% <100.00%> (ø)`
...f-core/src/climate_ref_core/regression/manifest.py	`100.00% <100.00%> (ø)`
...-ref-core/src/climate_ref_core/regression/store.py	`97.56% <100.00%> (+0.15%)`	⬆️
...s/climate-ref-core/src/climate_ref_core/testing.py	`89.83% <100.00%> (+0.36%)`	⬆️
...ckages/climate-ref/src/climate_ref/cli/__init__.py	`96.96% <ø> (ø)`
...ages/climate-ref/src/climate_ref/cli/test_cases.py	`82.58% <90.16%> (+2.12%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lewisjared added 3 commits June 15, 2026 12:56

docs(changelog): add fragment for CI coupling gate

4308af0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(regression): CI coupling gate for test case baselines#727

feat(regression): CI coupling gate for test case baselines#727
lewisjared wants to merge 3 commits into
feat/regression-clifrom
feat/regression-coupling-gate

lewisjared commented Jun 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewisjared commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewisjared commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 15, 2026 •

edited

Loading