Skip to content

Archie Intent Review — PR-time semantic review + snapshot-vs-contract sync#102

Merged
gbrbks merged 15 commits into
mainfrom
feature/archie-intent-review
Jun 23, 2026
Merged

Archie Intent Review — PR-time semantic review + snapshot-vs-contract sync#102
gbrbks merged 15 commits into
mainfrom
feature/archie-intent-review

Conversation

@gbrbks

@gbrbks gbrbks commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What

A new Intent Review GitHub Action that flags, on a PR, when a change deviates from the architectural source of truth — "you're breaking rule XY" — plus the snapshot-vs-contract refinement to /archie-sync that makes those deviations reliable.

The original idea: use the blueprint/rules as a baseline; when a dev finishes work, diff their change against the single source of truth and flag it on the PR; merge = the new baseline.

How it works

On pull_request, the Action (intent_review.py):

  1. Diffs the branch's folded .archie/blueprint.json + rules.json against the PR base SHA (deterministic, keyed diff — script owns what changed).
  2. Globs the sync ledger (.archie/changes/) for corroborating intent.
  3. Makes one Claude Haiku call to judge (not re-derive) each change: silent-weakening / contradiction / behavior-violates-rule, with a cited because.
  4. Posts one consolidated, non-blocking FYI comment — one finding per change, listing every colliding rule. Always exits 0; never blocks merge.

Snapshot vs. contract (sync Phase 1)

/archie-sync's code-fold now reconciles the descriptive mirror only and never silently moves the contract (rules/invariants):

  • advisory kinds (decision/pitfall/rule/guideline) are always staged, never auto-folded;
  • a contract fingerprint (invariants + prescriptive rule sections + rule files) surfaces contract_changed when the law is changed deliberately — visible, not blocked.

This guarantees a real code-vs-law deviation always reaches the PR instead of being papered over.

Contents

  • archie/standalone/intent_review.py — the zero-dep review engine (+ npm asset)
  • archie/assets/workflows/archie-intent-review.yml — the Action (checkout@v5/setup-python@v6)
  • archie/assets/setup-archie-intent-review.sh — one-command gh-based setup (no GitHub-web tinkering)
  • archie/standalone/sync.py — snapshot-vs-contract Phase 1
  • scripts/verify_sync.py, archie.mjs, install.py — distribution wiring (intent_review committed via gitignore exception so CI can run it)
  • docs/ — design doc, delivery plan, snapshot-vs-contract plan + implementation
  • tests/test_intent_review.py (47) + tests/test_sync.py updates

Validation

  • Full suite 1029 passed / 1 skipped; verify_sync green.
  • Three review rounds (adversarial verification, right-thing validation, deltas/CI/prompt) — blockers fixed (robust base-SHA, never-block guarantee, finding consolidation).
  • Dogfooded live on a real repo: caught a planted "remove the $1,175 billing floor" change and a real "raise the 7-step cap" change, citing the exact colliding rules.

Known limitations / deferred (by design)

  • Archie-native PRs only — reviews the folded blueprint, so it needs /archie-sync to have run; a raw code PR with no sync is invisible (Layer-3 raw-code reading deferred behind an eval gate).
  • Concurrent in-flight PRs — sequential "one by one" works; two simultaneously-open incompatible PRs aren't cross-checked.
  • Precision unmeasured at scale — strong anecdotes, but the eval harness (replay historical PRs, score precision/false-evolution) is deferred.
  • Resolution UX (Phase 2/3) — auto-drafted amendment + "Fix or Accept" deferred; the review surfaces, the human decides.

🤖 Generated with Claude Code

gbrbks and others added 15 commits June 19, 2026 18:19
Captures the brainstormed design for a PR-time semantic review that checks a
branch's folded blueprint/rules diff (branch vs base) against retained
invariants, posts an FYI comment, and never blocks. POC scope: GitHub Action +
zero-dep script. Documents that /archie-sync already folds into the blueprint
on the branch, the Layer 1/2 design, guardrails from the adversarial review,
the decisions log, dependency chain, and open questions for planning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
End-to-end delivery plan (7 milestones, grounded against deep-scan/sync/
distribution internals via a research+critique+revise pass) plus the two
canonical setup assets the plan's M4/M5 produce:

- docs/archie-intent-review-delivery-plan.md — the plan
- archie/assets/workflows/archie-intent-review.yml — the Action (on: pull_request,
  fetch base ref, runs .archie/intent_review.py)
- archie/assets/setup-archie-intent-review.sh — idempotent gh-based one-command
  CI setup (prereq checks, silent secret via gh secret set, copies the canonical
  YAML, fork-PR caveat) — no GitHub-web tinkering

NOTE: not yet distributed. The file-sync wiring (npm-package/assets copies +
verify_sync.py .sh/.yml/plural-workflows checks + archie.mjs/install.py entries)
and the review engine intent_review.py are milestones M1a-M3 of the plan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ment, tests + wiring

intent_review.py (zero-dep): deterministic keyed diff of branch-folded
blueprint/rules vs base ref, sync-ledger glob (union of change_*.json),
conservative ledger join (file overlap AND keyword), one Haiku tool_use call
that JUDGES only (script overwrites diff_op/ids/layer), because-or-suppress,
upserted FYI PR comment, always exit 0 (never blocks; fork/no-secret early skip).

Wiring (M1a): archie.mjs + install.py (+ mirror) script lists; verify_sync.py
now byte-checks the plural workflows/ mirror and the setup .sh (drift-tested).

Tests (M7): 25 cases, all green; full suite 1003 passed / 1 skipped; verify_sync
green. Workflow YAML + gh setup script are the M4/M5 canonical assets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, coverage

From a 6-agent adversarial verification pass:

Blockers:
- never-block guarantee: route both comment posts through safe_post_comment()
  which swallows URLError (covers HTTPError) + OSError — a network hiccup no
  longer fails the Action.
- data-section diff: removed a dead `pass` that let pure-ADD data models through
  unintentionally; the script now surfaces every data change and the model judges.

Coverage / faithfulness (plan-required sections):
- diff platform_rules.json too (unioned with rules.json).
- diff pitfalls (id), decisions.trade_offs + out_of_scope (title-hash).
- unenforced_invariants deliberately excluded (advisory gaps) — documented.
- item_key falls back to a full-item hash so title-less items don't collide.
- comment lookup follows Link-header pagination (no dup comment past 100).
- workflow: continue-on-error on the base-ref fetch.

Distribution:
- npx installer now places setup-archie-intent-review.sh and workflows/ into
  .archie/, so `bash .archie/setup-archie-intent-review.sh` works post-install
  and resolve_workflow_src() finds the canonical YAML.

Tests: +18 (43 total) — new sections, pagination, URLError-swallow, retry/backoff,
flag order, path-overlap, item_key fallback, and a full main() integration via a
real origin clone. Full suite 1021 passed / 1 skipped; verify_sync green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… coverage

Validation review (does it deliver the value, not just pass tests) surfaced the
linchpin risk and a coverage gap:

- Base ref: diff against github.event.pull_request.base.sha (always in merge-ref
  history with fetch-depth:0) instead of origin/<base>. Removed the fragile
  `git fetch` + continue-on-error step that could silently degrade the diff to
  "everything is new" and post a confident-but-wrong review.
- fetch_base_file now distinguishes "file genuinely absent at a valid ref"
  (legitimate all-ADD) from "ref unresolvable" — the latter posts a loud
  "review skipped" note rather than a misleading all-new result.
- Coverage: components[] now diffed (keyed, Layer 2) so component removal /
  responsibility changes are caught; communication/descriptive snapshots remain
  deliberately out of POC scope, now documented (not a silent divergence).

Tests: +3 (45 total) incl. unresolvable-ref-is-error, component-remove, and the
main() integration now drives the base-SHA path via a real origin clone. Full
suite 1023 passed / 1 skipped; verify_sync green. Delivery plan §10 records the
amendments. M6 dogfood remains the user's real-repo step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
.archie/*.py is gitignored (regenerated by the installer), so intent_review.py
would never reach CI — the Action runs `python3 .archie/intent_review.py` where
no Archie install exists. Add it to the committed hook-runtime exception set
(alongside _common/lint_gate/align_check/arch_review) so it ships in the repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iding rules

Real-repo test produced 8 near-identical findings (2 functions x 4 rules) for a
single change (cap 7->12). Now the model emits ONE finding per distinct change,
spanning multiple item_refs and listing ALL colliding_rules in a single cited
because. A dedup backstop merges any findings the model still splits (same type +
same colliding-rule set). Render shows "<change> (op, Layer N · K sites) — Collides
with: rule1, rule2…". 8 -> 1.

Tests: +2 (consolidate-across-items, dedup-merges-split); 47 in-file, full suite
1025 passed / 1 skipped; verify_sync green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phased plan to split the blueprint into a mirror (sync auto-updates from the
diff) and a contract (the law — deliberate edits only), so Intent Review reliably
catches drift and distinguishes violation from intended amendment via the diff,
not commit prose. Phase 1: sync code-fold becomes contract-readonly. Phase 2:
deliberate amendment path. Phase 3: review labels amendment vs violation. Grounded
in sync.py _SECTION_MAP/fold-context/fold-apply + the cap worked example.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
No manual rule-editing: the human only Fixes (code complies) or Accepts (merge).
Accept's contract change is auto-drafted by the system (affected + interlocked
rules) and applied on merge — the "auto-drafted amendment" from the original
brainstorm. The merge-vs-fix choice is the intent signal. Phase 3 drafts the
amendment + presents Fix-or-Accept; Phase 2 applies the draft (in-PR suggestion
or on-merge reconcile).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1 (sync code-fold contract-readonly: gate _KIND_TARGET to mirror kinds +
byte-identity guardrail + rendered-doc audit + SKILL/tests), Phase 3 (review
drafts the consistent interlock amendment + Fix-or-Accept render + consistency
check), Phase 2 (apply the draft on Accept via in-PR suggestion or on-merge
reconcile). Grounded in sync.py _KIND_TARGET/fold-context/fold-apply and
intent_review.py. Ship order 1 -> 3 -> 2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Goal narrowed to 'reliably see deviations on the PR'. The review already shows
them; the only remaining gap is Phase 1 — stop sync's code-fold from silently
moving the rules to match the code (which would hide a deviation). Drafting +
Fix-or-Accept + on-merge apply are deferred until the seeing-problems loop is solid.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tract)

A code-fold must never silently move the law, or a real code-vs-contract deviation
would be hidden from the PR Intent Review.

- _classify: advisory kinds (decision/pitfall/rule/guideline) are ALWAYS `staged`,
  never eligible/folded — they're the contract. Only the descriptive mirror folds.
- fold-context: surfaces advisory claims as `staged_amendments` (proposed, not
  folded) and snapshots a contract fingerprint (invariant sections + rules.json +
  platform_rules.json).
- fold-apply: refuses (before render) any fold that changed the contract fingerprint.
- SKILL.md Step 4: edit the mirror only; advisory = proposed amendments.

Tests: advisory-always-staged, advisory-not-a-fold-target, contract guardrail
aborts on rules.json and invariant edits; updated the fold tests that encoded the
old advisory-folds behavior. test_sync 21 pass; full suite 1028 passed/1 skipped;
verify_sync green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rules legitimately change during real work — the guardrail must not refuse them.
fold-apply no longer aborts when a fold also changed the contract; it proceeds and
reports contract_changed + a note, so the law can move DELIBERATELY but never
SILENTLY. advisory->staged still stops AUTOMATIC contract moves; deep-scan and
deliberate edits change rules as before. SKILL updated. test_sync 21 pass; full
suite green; verify_sync green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audit gaps:
- contract fingerprint now covers the prescriptive blueprint sections too
  (development_rules / infrastructure_rules / architecture_rules), not just the
  invariants + rule files — these are the law-in-the-blueprint and no descriptive
  kind targets them, so it's safe.
- corrected stale comment ("rule is the only kind that edits rules.json") and the
  SKILL eligibility line to state advisory kinds are ALWAYS staged.

test_sync 22 pass; full suite green; verify_sync green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
archie Ready Ready Preview, Comment Jun 23, 2026 5:57pm
archie-viewer Ready Ready Preview, Comment Jun 23, 2026 5:57pm

@gbrbks gbrbks merged commit 3034327 into main Jun 23, 2026
4 checks passed
@gbrbks gbrbks deleted the feature/archie-intent-review branch June 23, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant