feat: Structural Integrity Score + PR gate, and automated-sync#104
Merged
Conversation
…eep-scan & sync
A deterministic integrity layer: a worklist of open contract divergences (each
file:line + the decision/law it breaks) rolled up into one score. The number is
never the gate — only open grounded divergences in a diff block a PR.
- scoring.py: composite = min(weighted arithmetic body, geometric correctness-
ceiling over {Reconciliation, Product-Law Coverage}). No floor/drag magic
constants. Structural Health is an informational panel, not a headline axis.
Size-normalized axis derivations; absent != a free 100.
- score.py: reads .archie/ artifacts, computes the AIS + worklist, explains the
context (explain()), renders worklist-first terminal / PR-markdown views,
persists the committed baseline (score.json + history), and a diff-scoped gate
(--diff <base>, exit 1 on a grounded divergence in the diff).
- Wired into /archie-deep-scan Step 9 (baseline write + closing-summary line) and
/archie-sync (integrity standing after record). No new slash command.
- 22 tests; npm-package assets + installer in sync (verify_sync passes).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…/viewer bundle build_bundle now packages .archie/score.json (headline, worklist of open divergences, and the plain-language explanation block) as bundle["integrity"], so the local viewer (/api/bundle) and /archie-share carry the context — not just a number. Rendering it in the React view is a frontend change (that source lives outside this repo). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ock a build The "smoke alarm test": run each rule against known-good and known-bad code (including near-misses like the forbidden pattern sitting in a comment), measure precision/recall, and mark a rule block_eligible only if precision >= 0.95. Jumpy rules degrade to WARN. Reuses check_rules.py (the real gate engine); labels come from how each case is built, so it's non-circular. The demo catches a plausible raw-SQL rule that false-fires on a comment (precision 0.5 -> WARN). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gate reads .archie/rule_calibration.json: a grounded divergence whose rule failed the calibration (block_eligible false — too jumpy) is demoted to a warning instead of failing the build. With no calibration data, behavior is unchanged, so calibration only ever tightens the gate. Adds write_calibration() to the harness. Demo: a raw-SQL rule that false-fires on a comment (precision 0.5) flips the gate from BLOCK to PASS-with-warning once calibrated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The god-function and empty-catch platform rules leaked an internal benchmark name into their user-facing descriptions. Rewrite both to plain, actionable guidance. The name now lives only in measure_health.py's internal docstring, never in surfaced text. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onor ignores Rename the headline from "Architecture Integrity" to "Structural Integrity" across the terminal report, PR comment, viewer panel + sidebar, and the deep-scan/sync workflow docs. The deterministic score only covers structurally-checkable rules (layering, dependency direction, placement, naming, DI wiring, law-enforcement presence), and the name now says so. - Add an explicit "what this is NOT" (limits) to the explanation: it is not a code-quality grade and does not judge behavioral / product-law correctness — that stays in the LLM review layer. Surfaced in the terminal footer, the PR "how to read this", and the viewer panel. - Group open divergences by (file, rule) into one worklist entry carrying a title + detail + the affected lines/count; render the grouped shape in all three surfaces. - Honor .archieignore/.gitignore in score.py's LOC fallback via IgnoreMatcher, matching check_rules' read-boundary. The worklist already respected ignores; the size-normalization denominator now does too (proven: worklist 1->0 and LOC 5001->0 under .archieignore). - Move the integrity panel into the viewer's Risks section + sidebar. The result field stays named `ais` for back-compat with the share bundle, score.json, and the viewer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Automated sync: self-propelling blueprint maintenance (Claude Code + Codex)
…a loop The automated-sync merge (#103) updated only the canonical archie/assets + archie/standalone trees; the npm-package mirror was never synced, so `npx @bitraptors/archie` shipped the feature dead. Restore the mirror and close the Stop-hook loop bug. F1 — npm distribution was broken: - churn-track.sh was missing from npm-package/assets/hook_scripts/ (no churn hook would ship), and verify_sync didn't even check the hook_scripts mirror — which is how it slipped through. - sync.py (the six new subcommands), the /archie-sync SKILL.md (Step 1b + consume-on-success), and manifest_data.py (the churn-track HookDef) were stale in the mirror. - Adding the hook_scripts mirror check surfaced two more stale hooks the merge missed: post-plan-review.sh (plan-capture tee) and pre-commit-review.sh (commit advisory). Fix: sync all of them; add check_hook_scripts_mirror() to verify_sync so a new/edited hook script can never silently fail to ship again. F2 — the Stop nudge could not be declined: stop.sh read no stdin and unconditionally exit-2'd while churn was crossed, ignoring stop_hook_active. When the agent declined and tried to stop again, the nudge re-fired and re-blocked — an indefinite loop that defeated the "Decline if nothing is worth recording" affordance. Fix: read the Stop envelope and exit 0 when stop_hook_active is true (nudge once per stop attempt). Regression test added. verify_sync green (now incl. hook_scripts); 34 sync/hook tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a durable, PR-visible reminder that the Living Blueprint may be stale — the boundary the session hooks can miss (small work, declined nudges, or a session that never hit a stop hook). Detection is content-based (Option B), so it survives rebases/squashes and works whether the synced changes were committed or not: - `sync.py sync-stamp` writes committed `.archie/sync_state.json` — a sha1 fingerprint of every source file the sync reconciled (honoring .archieignore/.gitignore). Wired into the sync SKILL's consume-on-success step next to plan-consume / churn-reset. - `intent_review.py` (the PR action) computes `sync_advisory()`: the PR's changed source files whose CURRENT content differs from the recorded sync (or all of them, if no sync was ever recorded). It posts a non-blocking "run /archie-sync" section in the existing review comment — and now runs even when there's no blueprint diff (the exact case it must catch), which previously short-circuited. - Shared `_common.file_sha1` / `source_fingerprint` so the stamp and the check agree on a file's identity by construction. Advisory only — never blocks merge, consistent with Archie's hook discipline. `sync_state.json` is a committed output (not gitignored), so it travels with the PR for CI to read. Tests: sync-stamp fingerprint + the synced/drift/no-marker advisory paths + section rendering. verify_sync green; 113 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adversarial review (15 confirmed findings) surfaced one real bug cluster plus three regressions from the previous restructure. Fixes: Selection mismatch (the real bug — flagged source files forever): - sync_advisory now derives its candidate set from source_fingerprint — the SAME universe the stamp records — so a tracked source file under a SKIP_DIRS dir (vendor/, Pods/, dist/) or a gitignored/.archieignore'd path is no longer flagged "unsynced" on every PR with no way to clear it. - That intersection also drops deletions for free (gone from the universe), so a removed file is no longer surfaced as a phantom "re-sync this path". - The diff now uses `-z`, so non-ASCII paths aren't C-quoted → dropped (was a silent false-NEGATIVE: drift in such files was never reported). intent_review regressions: - Model-call failure on a real blueprint diff now renders an explicit "Intent review could not run" notice instead of a clean-looking review. - The review section always renders when the blueprint changed, so a later advisory-only run can't silently erase a prior review's context. - The sync advisory is computed before the blueprint guards, so it still surfaces when the branch blueprint is absent or malformed. sync-stamp hardening: - Returns non-zero on failure (was exit 0 — a skipped stamp looked like success). Atomic write (temp + os.replace). sort_keys to kill os.walk ordering churn in the committed JSON. Warns on an empty fingerprint. verify_sync: check_hook_scripts_mirror now walks the whole subtree (rglob + is_file), matching what the npx installer ships — not just *.sh. Tests: ignored/SKIP_DIRS exclusion, deletion skip, model-failed notice. verify_sync green; 116 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… cruft filter A re-review of the prior fixes found the sync_advisory rewrite had introduced a perf regression and dropped a robustness guarantee. - HIGH: sync_advisory walked + hashed the WHOLE repo via source_fingerprint on every PR (O(repo)), even for a docs-only diff — a real regression vs the old O(diff) path, biting large monorepos under fetch-depth:0. Now it classifies only the changed paths with a new per-path predicate `_common.is_source_path` (the exact per-file form of source_fingerprint's SOURCE_EXTENSIONS + SKIP_DIRS + ignore rules) and hashes only those — back to O(diff), still byte-consistent with the stamp, still excludes ignored/SKIP_DIRS files and skips deletions. - MEDIUM: restored the "never raises" guarantee — the whole sync_advisory body is now guarded (source_fingerprint/IgnoreMatcher could have raised and broken the always-exit-0 Action contract). - MEDIUM: verify_sync's hook-scripts check now ignores OS cruft (.DS_Store, __pycache__, *.pyc, *.tmp) so a stray macOS file can't false-positive the sync gate. - LOW: cmd_sync_stamp cleans up its .tmp file if os.replace fails. Tests: non-ASCII path (-z fix), main() posting the advisory with no branch blueprint (#7), and check_hook_scripts_mirror subtree-coverage + cruft filter. verify_sync green; 119 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related bodies of work land together on this branch (the automated-sync feature, PR #103, was merged into this branch and rides along — flag if you'd prefer them split before merge).
1. Structural Integrity Score + PR gate
A deterministic headline score measuring how well the code upholds the structurally checkable parts of its documented contract — layering, dependency direction, placement, naming, DI wiring, plus whether product laws have an enforcement mechanism. It is not a quality grade and does not judge behavioral/product-law correctness (that stays in the LLM review layer); generic complexity is shown as hygiene only.
scoring.py) + CLI (score.py):min(weighted body, geometric correctness-ceiling); worklist-first output (the worklist is the point, the number is the roll-up)./archie-sync) / Open — only open grounded divergences in a diff block the gate (exit 1); the number never blocks.block_eligible; jumpy rules demote to advisory./archie-deep-scanStep 9), and CI; rendered in the terminal, the PR comment, and the viewer (Risks section + sidebar)..archieignore/.gitignorehonored across the worklist and LOC normalization.2. PR sync-advisory (this session)
A durable, reviewer-visible nudge when code changed without an
/archie-sync— the boundary the session hooks can miss.sync.py sync-stampwrites committed.archie/sync_state.json(a content fingerprint of reconciled source).intent_review.pyflags PR-changed source files whose current content differs from the last stamp (content-based → rebase/squash-immune; O(diff); honors ignore/SKIP_DIRS; skips deletions;-zso non-ASCII paths aren't dropped). Advisory only — never blocks merge.3. Carried from PR #103 (automated-sync)
Background hooks accrue churn + captured plans and nudge
/archie-syncat turn-end (exit 2, with astop_hook_activeloop guard) and at commit time; the sync skill consumes those signals. Also fixed: the merge had left the npm mirror out of sync (churn-track.sh+ stale hooks/manifest/SKILL would have shipped the feature dead vianpx) — restored, andverify_syncnow guards thehook_scriptssubtree.Verification
verify_syncgreen (scripts + workflow + viewer + hook_scripts mirrors).🤖 Generated with Claude Code