Delivery Review + Clean-Room Intent Capture by gbrbks · Pull Request #107 · BitRaptors/Archie

gbrbks · 2026-07-02T21:04:42Z

Delivery Review + Clean-Room Intent Capture

Adds an AI delivery review to Archie — a PR-time check that reconciles intent, blueprint, and diff into one verdict: did the change build what was asked, and did it break anything? Plus a clean-room intent-capture layer that feeds it trustworthy, transparent intent.

Non-blocking by contract: every review posts an advisory PR comment and never fails a check or blocks an edit. The developer runs zero Archie commands in the happy path.

What's in it

Delivery review pipeline — edge-A (intent ⋈ diff completeness) · behavioral + conformance code review (intent-aware) · edge-C (requirement ⋈ invariant conflict) · a single editor gate (falsification-required, floor/anchor/dedup, cannot invent) · a delivery-verdict comment.
Three surfaces — deep-scan cold-read gate, sync.py review (manual), and the PR gate (GitHub Action, auto).
Clean-room intent capture — the coding agent no longer authors its own yardstick. Hooks Archie already installs log the user's verbatim planning turns; an isolated agent, blind to the code, turns them into .archie/intent.json criteria. Committed, versioned, human-ratifiable, auto-synthesized at PR time if nobody ran it.
Transparency — the verdict comment renders the criteria list with ✅/❌, provenance + confidence + a trust label (confirmed vs unconfirmed), and a correction footer. show/synthesize/confirm-intent are opt-in.
CI POC wiring — delivery_review.py runs via the existing setup-archie-intent-review.sh workflow (base-ref execution; secrets not exposed to PR-editable code).

Guarantees

Doesn't block developers. Capture is a silent log append; the review only comments. The enforcement hook's blocking exit 2 is preserved (proven by real subprocess tests).
Breaks the circularity. Criteria are authored blind to the implementation — verified end-to-end; a test enforces the prompt carries no code.

Quality

181 tests pass (python3 3.9.6); verify_sync clean (52 scripts, canonical ↔ npm assets).
Built subagent-driven, TDD per task, with per-task reviews and adversarial final reviews — which caught and fixed a data-loss wipe, several security holes, a swallowed enforcement exit 2, and an always-0/N completeness bug before merge.
Provenance/originality audit in the design doc (§11) — borrowed vocabulary renamed, mechanisms re-provenanced to public prior art.

Deferred (documented follow-ups)

Linear/Jira ticket fetch (seam built) · the full contract→tracer→challenger invariant specialist (single-pass shipped) · sync auto-trigger hook · verdict→check-state/AIS teeth · richer verdict styling · honest met/unknown split (needs edge-A per-criterion verdicts).

Design + plans: docs/archie-delivery-review-design.md, docs/archie-branch-intent-design.md, docs/archie-intent-capture-design.md.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Implement selector.py — non-reviewing router that intersects changed files with blueprint anchors to pick Lane-2 specialists. Pure function, zero dependencies. Includes TDD test suite (3 tests: invariant selection on cited file, data-lifecycle selection on store file, empty result on untouched file). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Replace raw substring branch in _hit() with path-segment-aware check to prevent short anchors like "db" matching unrelated files such as services/redis_db_client.py. Add regression test covering this case. Import stays as sys.path form — archie package init pulls in tomllib (Python 3.11+) which is unavailable on this env (Python 3.9). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Editor gate: the single reliability pass every surface routes through. Validates schema, floor-gates, anchor-checks, and dedups against store. Never invents — every confirmed item is an input item unchanged in id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Implement consumers() to invert scanner's import graph and compute transitive blast radius. Pure stdlib, no dependencies, handles module stem extraction and BFS cycle-free walk over dependencies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Change review() default arg from run=run_verifier (definition-time bind) to run=None with call-time lookup so monkeypatch.setattr works correctly. Strengthen test_review_mocked to assert a real finding flows through the lambda; add test_parse_findings_drops_missing_falsification negative test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Append three functions to archie/standalone/intent.py to resolve ticket IDs from mixed sources and persist branch intent records. - ticket_ids_from: Extract unique ticket IDs from branch, PR body, commit messages via regex - load_branch_record: Load .archie/intent/<branch>.json - save_branch_record: Save record, never downgrading confidence Records are ranked {inferred: 0, commits: 1, pr_body/prompt: 2, linear: 3}. Higher-confidence sources never get overwritten by lower ones. Tests: 3/3 passing (test_intent_ladder.py). Task 7 regression check: 3/3 passing (test_intent.py). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Append aggregate_verdict() function to reconcile.py for computing {intent_completeness, breaks, conflicts, gate_signal} from confirmed findings per intent spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add evidence-schema fields (kind, edge:"B", anchor, assumptions, falsification) to step-5b-risk.md emission contract. Add contract test + fixture that verify every risk finding satisfies evidence_schema.has_evidence_fields. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add `should_review` (intake eligibility) and `render_verdict` (Markdown verdict) pure functions in delivery_review.py; wire GitHub Action step. Full PR-gate orchestration deferred to a later task. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Count distinct criterion_id values when computing unmet acceptance criteria in aggregate_verdict; fall back to per-finding count when criterion_id is absent. Thread criterion_id onto edge-A findings in parse_edge_a so downstream dedup works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…verdict A1: add extract_json_obj to evidence_schema — brace-depth scan that resumes after failed parses so stray balanced-brace prose no longer silently drops all findings; replace find/rfind/json.loads in behavioral_review + reconcile. A2: add coerce_confidence to evidence_schema — null/string confidence degrades to 0.0 instead of crashing; wired into make_finding and both parsers. A3: aggregate_verdict now counts intent_drift findings, includes drift in gate_signal formula (0.1 weight), and returns "drift" key. A4: guard None acceptance_criteria in build_edge_a_prompt and aggregate_verdict with (... or []) so a stored null doesn't crash iteration. Tests: 23 passing (17 existing + 6 new). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…precision B1 editor_gate: widen dedup key to (file,line,kind) so distinct findings on different lines of the same file are not silently dropped; coerce string anchor.line to int before membership test; treat line=None + file-in-changed-set as a file-level finding (keep). B2 selector: remove interior-substring clause from _hit so anchor src/api/ no longer matches vendor/src/api/x.py; drop break after first invariant match so all matching invariant ids surface in the reason string. B3 reachability: replace stem-only reverse graph with path-suffix resolution for path-ish imports (containing /), avoiding cross-directory basename collisions; bare module names still use stem fallback via __stem__: keys; remove dead stem assignment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rank merge, path collision C1: add _TICKET_DENYLIST to reject CVE/UTF/SHA/RFC/ISO etc. from ticket extraction. C2: use O_NOFOLLOW + unlink-before-write to prevent symlink write-through; set 0o600 perms. C3: equal-rank saves now merge ticket_ids/goals/ac/non_goals instead of overwriting. C4: _record_path uses slug+sha1[:8] suffix so "a/b" and "a__b" never collide. New tests: ticket denylist, symlink refused, perms 0o600, equal-rank merge, path no-collision. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…store D1: add _is_safe_ref guard; reject dashed refs in detect_base and changed_files; git diff uses '--' end-of-options separator; merge-base ref validated via rev-parse --verify; add changed_files_result for structured ok/reason signal. D2: _merge_findings_into_store and gate_and_merge guard against bare-list store (backs up to findings.json.corrupt sidecar, never silently wipes); all store writes are atomic via tempfile.mkstemp + os.replace in the same dir. Tests: 10 new tests (diff_basis x7, finalize x3) + existing 7 all green (17/17). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…aping - E1: `should_review` now uses `or 0` / `or []` guards so None-valued `changed_files` and absent/null `labels` from GitHub JSON no longer raise. - E2: Added `_sanitize()` (html.escape + @ neutralization via zero-width space); applied to every model-derived field in `render_verdict` so LLM-generated problem_statement / kind / anchor.file cannot inject HTML comment markers, raw HTML, or live @mentions into the PR comment body. - Tests: 5 new cases covering None inputs, marker injection (exactly 1 real marker survives), @mention neutralization, and HTML escaping. Existing loose `"1" in md` assertion tightened to `"1 break(s)" in md`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…fetch Restructure _run_codex so the finally-unlink is at the outer try level, guaranteeing cleanup on FileNotFoundError, TimeoutExpired, and non-zero returncode. Switch NamedTemporaryFile(delete=False) → mkstemp+os.close. Add two targeted tests that assert both failure modes return "" and leave no temp file. Sync agent_cli.py to npm-package/assets/. Harden archie-check.yml: add least-privilege permissions block (contents:read, pull-requests:write), change fetch-depth 2→0 for full merge-base reachability, and add trust-model comment above delivery-review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…eview entrypoint Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…alignment - Add resolve()/build_resolve_prompt() to intent.py: fills goals/acceptance_criteria via one LLM call; no-op when raw is empty - Wire resolve() into sync_review.run_sync_review before review_edge_a so edge-A always sees populated criteria - Add severity kwarg (default "medium") to evidence_schema.make_finding - behavioral_review.parse_findings: severity="high", severity_class="tradeoff_undermined" (advisory, non-blocking) - reconcile.parse_edge_a: per-kind severity (unmet=high/partial=medium/drift=low), severity_class="tradeoff_undermined" - step-5b-risk.md: limit emitted kind to behavioral_break|conformance_break (remove schema_drift/mechanical_violation/pattern_divergence) - Tests: resolve() noop/populate/bad-json, sync_review resolves before edge-A, severity field, non-blocking severity_class, risk taxonomy Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add tests/test_delivery_integration.py: drives selector → intent.resolve → review_edge_a → behavioral_review → review_edge_c → editor_gate → aggregate_verdict with only the LLM seam mocked, proving no shape-mismatch bugs hide at module boundaries. Second test exercises anchor-gate string/int coercion (line as "999" str) to confirm off-anchor findings are suppressed. Tighten tests/test_selector.py: split reason string on comma before checking inv-1 so the assertion cannot pass for inv-10. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-ref exec + resolve persistence Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…cord Was reading PR body from an unset ARCHIE_PR_BODY env var -> completeness always 0/0 in the real workflow. Now sourced from the event payload; branch record keyed on PR head ref. Verified end-to-end with real LLM: 3/6 criteria. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a delivery_review.py step (base-ref exec, same trust model + secret + blueprint prereq as intent_review) to the canonical archie-intent-review.yml that setup-archie-intent-review.sh installs. Running the existing CI setup script now provisions BOTH intent review and delivery review on PRs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ptional Linear) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…+ PR body) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…formance) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…t.json Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The merge_specs function now captures each spec's singular ticket_id (in addition to ticket_ids list), ensuring no tickets silently drop when merging. load_committed_intent return type corrected to dict | None to match actual behavior (returns None on absent/malformed files). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…d blind transform) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…vents - Implements intent_synthesize.py: regenerates .archie/intent.json from events - Accepts ONLY the intent-event log, NOT code/diff/conversation (blindness guaranteed) - Writes unconfirmed spec with provenance (capture_points, captured_at) - Resynthesize can RETIRE criteria (no scope ratchet) via regeneration not union - build_synthesis_prompt() asserts "NOT shown the implementation" in synthesis prompt - parse_synthesis() extracts JSON spec with normalized acceptance_criteria - synthesize(root, run=None) writes atomically to .archie/intent.json via os.replace - Zero deps beyond stdlib; imports guarded; tests pass fake run to assert blindness - 5 tests pass: blindness check, parse_synthesis, unconfirmed spec, retire, no-events Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…parency) Add four transparency subcommands to sync.py: cmd_capture_intent appends a user-turn event; cmd_synthesize_intent runs the blind transform; cmd_show_intent pretty-prints goals/criteria/provenance/confirmed; cmd_confirm_intent sets confirmed=true. Includes dispatch registration, _usage() lines, and intent_imports() helper. All tests pass (3 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ilent SKILL step - pre-turn.sh: append best-effort user-turn intent event via intent_capture.py (guarded by [ -f .archie/intent_capture.py ]; || true; exit 0 preserved) - pre-validate.sh: append best-effort edit-transition marker after PYEOF (fires only on non-blocking path; blocking exit 2 still propagates correctly) - SKILL.md: replace "Capture branch intent" agent-authoring step with "Branch intent (captured automatically)" — synthesize-intent / show-intent / confirm-intent commands; removes "author \`goals\` and concrete" instruction - Copy mirrors to npm-package/assets/ (verify_sync.py: 52 scripts, PASS) - Add tests/test_intent_hook_wiring.py (3 tests, all pass) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…mpts Non-goals in the intent spec are now threaded through: (a) merge_specs unions and deduplicates them, (b) build_edge_a_prompt and build_conformance_prompt include a NON-GOALS section when present, (c) intent_brief surfaces them so behavioral reviewer sees them. Empty non_goals lists leave prompts unchanged (backward compatible). Scope creep is now checkable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…esize fallback - aggregate_verdict: unaddressed criteria count as unknown (not silently met); return dict gains 'unknown'; met = addressed - unmet only. - render_verdict(verdict, confirmed, spec=None): provenance+confidence header, per-criterion ✅/❌ list, trust label (human-confirmed vs lower-trust), correction-loop footer. Caller updated to pass spec. - run_pr_gate: hands-off auto-synthesize fallback — if no .archie/intent.json exists but events were captured, runs blind synthesize() before assembling intent. Non-blocking, always guarded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ture Capture _rc=$? immediately after the enforcement heredoc PYEOF so the intent-capture if-block (best-effort, || true) cannot swallow a blocking exit 2 — the script now ends with exit $_rc. Apply the identical fix to both canonical and npm-package mirror copies. Also fix two test-integrity minors: use the hyphenated intent-events.jsonl filename the library actually produces, and tighten the SKILL.md assertion from or to and. Add tests/test_pre_validate_exit_code.py with structural + real execution tests that prove exit 2 survives the appended capture block. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…); update stale tests Replace unknown/addressed logic in aggregate_verdict with silence=met semantics: a criterion not flagged unmet by edge-A is counted met (total − unmet). unknown is now always 0, reserved pending per-criterion verdicts from edge-A. Updates three test_verdict assertions and the stale write-intent sync-skill test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-07-02T21:04:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
archie	Ready	Preview, Comment	Jul 2, 2026 9:05pm
archie-viewer	Ready	Preview, Comment	Jul 2, 2026 9:05pm

gbrbks and others added 30 commits July 1, 2026 15:47

docs(review): delivery-review design + implementation plan

0b8b9a9

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): evidence schema + finding builder

b89498e

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): diff-basis base-detection ladder

7fd9643

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(test): use repo sys.path import convention (py3.9 tomllib)

d949592

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): plan import convention for py3.9 (sys.path + bare import)

fb83440

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): behavioral reviewer (prompt/parse + blast radius)

f1b67e4

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): LLM run-injection convention for mockable seams

c2eec1d

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): intent_spec normalization + confidence ceiling

1302838

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): reconciliation edge A (intent vs diff)

a8879e3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): delivery verdict aggregation

a04b21a

Append aggregate_verdict() function to reconcile.py for computing {intent_completeness, breaks, conflicts, gate_signal} from confirmed findings per intent spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(deep-scan): cold-read editor gate in finalize

a60cd7b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(sync): light delivery review with skip-gate

b263888

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(ci): guard delivery-review step when .archie not installed

adb8469

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): wire delivery pipeline — finalize gate routing + sync r…

06a0137

…eview entrypoint Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gbrbks and others added 27 commits July 1, 2026 18:43

fix(review): editor_gate coercion + comment hardening + workflow base…

0e84deb

…-ref exec + resolve persistence Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): conformance_break producer + real PR changed_lines

98c4868

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): branch intent capture design (committed intent file + o…

4061fbd

…ptional Linear) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): branch intent capture implementation plan

9f8d69b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): descope Linear from branch-intent plan (committed file …

14d10cc

…+ PR body) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): add Task 6 — intent-aware code review (behavioral + con…

42eb40f

…formance) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(intent): committed intent file read/write + merge_specs

fd1d70b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(sync): write-intent subcommand writes committed .archie/intent.json

331cf85

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): sync review reads committed .archie/intent.json

36ec899

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(pr-gate): assemble intent from committed file + PR body

641ebda

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(sync): capture branch intent step writes committed .archie/inten…

bc33226

…t.json Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(review): intent-aware code review (behavioral + conformance)

bad9996

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): clean-room intent capture design (hook signal + isolate…

ae02198

…d blind transform) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): clean-room intent capture implementation plan (6 tasks)

1fda639

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

docs(review): silent intent capture (no mid-work nag)

41841a9

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(intent): deterministic intent-event log + transition state machine

4d17db0

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview – archie-viewer July 2, 2026 21:04 View deployment

vercel Bot deployed to Preview – archie July 2, 2026 21:05 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delivery Review + Clean-Room Intent Capture#107

Delivery Review + Clean-Room Intent Capture#107
gbrbks wants to merge 58 commits into
mainfrom
feature/archie-delivery-review

gbrbks commented Jul 2, 2026

Uh oh!

vercel Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gbrbks commented Jul 2, 2026