Skip to content

feat(scripts): hcg-policy-smoke.sh — §1.5 operator pre-check (Phase E)#210

Draft
hyperpolymath wants to merge 1 commit into
mainfrom
phase-e/policy-deny-smoke
Draft

feat(scripts): hcg-policy-smoke.sh — §1.5 operator pre-check (Phase E)#210
hyperpolymath wants to merge 1 commit into
mainfrom
phase-e/policy-deny-smoke

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Summary

Lands scripts/hcg-policy-smoke.sh — a checked-in smoke that exercises
the HCG tier-2 live Verb Governance Spec
(config/gateway-policy-boj.yaml) from outside the gateway. Replaces
the manual probe sequence the rollout runbook §1.5 last open item
formerly described as "out of band — operator pre-check".

Single-lane HCG tier-2 channel (standards#91). Phase A (#96), B (#97),
C (#98), D (#99) are joint-closed; Phase E (standards#100) is the
active phase. This PR lands one tractable artefact (§1.5 operator
pre-check now checked-in and reproducible); staging soak (§2),
production traffic split (§3), and the §6.4 Trustfile flip remain
owner-driven.

What this PR lands

  • scripts/hcg-policy-smoke.sh — POSIX-conformant bash + curl, no
    jq/yq dependency.

    • Deny mode (default): sends one no-trust-header probe to every
      non-public route in the live policy (25 routes spanning the 19
      authenticated and 6 internal+stealth entries) and asserts a 4xx
      response. The 4xx assertion covers both bare 403 and stealth-profile
      codes regardless of the gateway's :stealth_profiles runtime
      config. Plus a default-deny verb canary (DELETE /cartridges, PUT
      /health, PATCH /cartridges) confirming global_verbs: [GET, POST]
      enforces the ADR-0004 verb-governance invariant for un-listed verbs.
      Gateway-internal — BoJ does not have to be reachable.
    • --with-backend mode: additionally probes the allow path with
      X-Trust-Level: authenticated (and internal for internal+stealth
      routes), asserting the response is NOT a gateway-origin 4xx (2xx /
      3xx / 5xx all pass — BoJ's own status is fine; only a gateway deny
      is a failure). Requires BoJ reachable at the gateway's BACKEND_URL
      and the script to run from a trusted-proxy IP so the trust header is
      not stripped by the gateway's strip_untrusted_headers plug.
    • Exits 0 on all-PASS, 1 on any FAIL (with per-probe summary), 64 on
      usage error.
  • Runbook §1.5 — last unchecked operator pre-check item flips from
    a free-form "stand the gateway up ... exercise one allow + one deny
    per route" sequence (which was deferred to boj-server#165's test plan
    and documented as out-of-band) to a single
    scripts/hcg-policy-smoke.sh invocation. The PASS/FAIL summary
    attaches to the cut-over ticket; a single FAIL is a stop-the-rollout
    condition with the three failure modes named (policy not enforcing,
    BoJ unreachable, non-trusted-proxy caller stripping the header).

  • Runbook header — version 0.3 → 0.4; date 2026-06-09 → 2026-06-10;
    status line acknowledges the smoke script landing alongside the
    existing live policy promotion.

  • Runbook Appendix B — new cross-reference entry for
    scripts/hcg-policy-smoke.sh.

What this PR deliberately does NOT do

  • Close standards#100. Per runbook §6.5 the joint-close happens
    after the §6.4 Trustfile flip (tier_2_gateway.status: PENDING → DEPLOYED), which itself follows the §3.3 100% production-soak
    window. Using Refs to match the Phase E PR convention established
    by feat(config): promote gateway policy example → live (Phase E §1.5) #208 / chore(deps): bump nixpkgs from 01fbdee to 6368eda #38 / docs(hcg-load-profile): Phase D D1 — load profile declaration (standards#99) #168 and documented in §6.5 ("Do not self-close
    standards#100; joint-close is owner-only per the single-lane channel
    discipline"). The owner remains the sole closer of standards#100.
  • Touch HCG. This is a BoJ-side artefact: the script lives in
    scripts/, reads config/gateway-policy-boj.yaml, and probes the
    gateway over HTTP. No companion PR on the gateway repo required.
  • Run during CI deployment. The script is checked in but only the
    operator's explicit invocation against a live gateway URL exercises
    it. CI does not stand up a gateway to run it (would require an
    external service); the script is intentionally operator-driven, with
    the PASS/FAIL summary attached to the cut-over ticket as the
    evidence-of-pre-check artefact.
  • Diverge the policy from the script's route matrix. The script's
    route matrix mirrors the 25-route live policy. When the policy file
    evolves (new BoJ surface routes wired in), the script must be updated
    in lock-step — that is a benefit not a cost (the script doubles as a
    policy-completeness checklist), but it must be observed.

Verification

  • bash -n scripts/hcg-policy-smoke.sh — syntax check passes.
  • scripts/hcg-policy-smoke.sh (no args) — exits 64 (usage error).
  • scripts/hcg-policy-smoke.sh --help — exits 64 with full help.
  • Against a synthetic always-403 mock on :18443 — PASS=28 FAIL=0,
    exits 0 (deny-only mode covers all 25 policy routes + 3 verb
    canaries).
  • Against a closed port (no gateway up) — every probe FAILs with
    got=000 expected=deny; exits 1 with the FAIL line summary.
  • SPDX header MPL-2.0 matches repo convention (scripts/, docs/).
  • Runbook cross-references resolve (§1.5, Appendix B, sibling
    docs).

Channel position

standards#91 (parent, open)
├── #96 Phase A — closed (boj-server: contract + policy-authoring + example; gateway: -)
├── #97 Phase B — closed (gateway#10: mTLS primary path)
├── #98 Phase C — closed (gateway#11: strip; boj-server#106: TrustPolicy clause)
├── #99 Phase D — closed (boj-server#168 on 2026-06-01; gateway#12/#14/#22/#26/#30)
└── #100 Phase E — IN PROGRESS
     ├── E5 runbook draft — boj-server#128 (landed; rehearsal pending)
     ├── E1 loopback prereqs — boj-server#130/#131/#132/#165/#173 (landed)
     ├── E1 deploy spec — http-capability-gateway#38 (landed)
     ├── E1 live policy promotion — boj-server#208 (landed)
     ├── §1.5 operator pre-check smoke — THIS PR (in review)
     ├── E1 .ctp signing — owner follow-up
     ├── E2 staging cut-over — owner follow-up
     ├── E3 telemetry verification — owner follow-up
     ├── E4 production rollout — owner follow-up
     └── §6.4 Trustfile flip + §6.5 joint-close — owner-only

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#100

🤖 Generated with Claude Code


Generated by Claude Code

Lands `scripts/hcg-policy-smoke.sh`: a checked-in smoke that exercises
the HCG tier-2 live Verb Governance Spec
(`config/gateway-policy-boj.yaml`) from outside the gateway. Replaces
the manual probe sequence the rollout runbook §1.5 last open item
formerly described as "out of band — operator pre-check".

What the script does, by default:

* Sends one no-trust-header deny probe to every non-public route in
  the live policy (25 routes spanning the 19 authenticated and 6
  internal+stealth entries — `cartridges`, `cartridge/:name(/invoke|
  /sse|/load|/unload|/reload)`, `umoja/*`, `coprocessor/*`,
  `sdp/status`, `graphql`, `sse`, `order(-ticket)`, `community/*`,
  etc.) and asserts a 4xx response, covering both bare 403 and
  stealth-profile codes regardless of the gateway's
  `:stealth_profiles` runtime config.
* Sends a default-deny verb canary (DELETE /cartridges, PUT /health,
  PATCH /cartridges) to confirm `global_verbs: [GET, POST]` enforces
  the ADR-0004 verb-governance invariant for un-listed verbs.
* Exits non-zero with a per-probe FAIL summary on any mismatch.

This entire mode is gateway-internal — BoJ does not have to be
reachable. It can run during §1.5 staging stand-up before BoJ is wired
behind the gateway, and it'll catch a policy-not-loaded or
policy-not-enforcing regression at the cheapest possible step.

With `--with-backend`, the script additionally probes the allow path
with `X-Trust-Level: authenticated` (and `internal` for
internal+stealth routes), asserting the response is NOT a
gateway-origin 4xx (2xx / 3xx / 5xx all pass — BoJ's own status is
fine; only a gateway deny is a failure). This second mode requires
BoJ reachable at the gateway's `BACKEND_URL` and the script to run
from a trusted-proxy IP so the trust header is not stripped by the
gateway's `strip_untrusted_headers` plug.

Runbook diff:

* §1.5 — last unchecked operator pre-check item flips from a free-form
  "stand the gateway up ... exercise one allow + one deny per route"
  sequence (which was deferred to boj-server#165's test plan and
  documented as out-of-band) to a single `scripts/hcg-policy-smoke.sh`
  invocation. The PASS/FAIL summary attaches to the cut-over ticket; a
  single FAIL is a stop-the-rollout condition with the three failure
  modes named (policy not enforcing, BoJ unreachable, non-trusted-proxy
  caller stripping the header).
* Header — version 0.3 → 0.4; date 2026-06-09 → 2026-06-10; status line
  acknowledges the smoke script landing alongside the existing live
  policy promotion.
* Appendix B — new cross-reference entry for `scripts/hcg-policy-smoke.sh`.

What this PR deliberately does NOT do:

* **Close `standards#100`.** Per runbook §6.5 the joint-close happens
  after the §6.4 Trustfile flip
  (`tier_2_gateway.status: PENDING → DEPLOYED`), which itself follows
  the §3.3 100% production-soak window. Using `Refs` to match the
  Phase E PR convention established by #208 / #38 / #168 and
  documented in §6.5 ("Do not self-close standards#100; joint-close is
  owner-only per the single-lane channel discipline").
* **Touch HCG.** This is a BoJ-side artefact: the script lives in
  `scripts/`, reads `config/gateway-policy-boj.yaml`, and probes the
  gateway over HTTP. No gateway-repo change required.
* **Run during deployment.** The script is checked in but only the
  operator's explicit invocation against a live gateway URL exercises
  it. CI does not stand up a gateway to run it (would require an
  external service); the script is intentionally operator-driven.

Verification:

* `bash -n scripts/hcg-policy-smoke.sh` — syntax check passes.
* `scripts/hcg-policy-smoke.sh` (no args) — exits 64 (usage error).
* `scripts/hcg-policy-smoke.sh --help` — exits 64 with full help.
* Against a synthetic always-403 mock on :18443 — `PASS=28 FAIL=0`,
  exits 0 (deny-only mode covers all 25 policy routes + 3 verb
  canaries).
* Against a closed port (no gateway up) — every probe FAILs with
  "got=000 expected=deny"; exits 1 with the FAIL line summary.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#100

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 273 issues detected

Severity Count
🔴 Critical 15
🟠 High 137
🟡 Medium 121

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Stale AI session file -- delete",
    "type": "stale",
    "file": "GEMINI.md",
    "action": "delete",
    "rule_module": "root_hygiene",
    "severity": "medium"
  },
  {
    "reason": "Action  if: always()\n        uses: actions/upload-artifact@ea165f8 needs attention",
    "type": "unpinned_action",
    "file": "e2e.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action perpolymath/standards/.github/workflows/governance-reusable.yml@main\n needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in abi-drift.yml",
    "type": "missing_timeout_minutes",
    "file": "abi-drift.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in codeql.yml",
    "type": "missing_timeout_minutes",
    "file": "codeql.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in container-publish.yml",
    "type": "missing_timeout_minutes",
    "file": "container-publish.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant