Skip to content

fix(scenarios): provision rpc fleet via provision-node, not provision-snd#427

Merged
bdchatham merged 3 commits into
mainfrom
fix/scenarios-rpc-provision-node
Jun 22, 2026
Merged

fix(scenarios): provision rpc fleet via provision-node, not provision-snd#427
bdchatham merged 3 commits into
mainfrom
fix/scenarios-rpc-provision-node

Conversation

@bdchatham

Copy link
Copy Markdown
Collaborator

Coral-xreview blocker on the WS-C nightly migration (platform #1165): the externally-fetched load-test/release-test scenarios still provisioned the RPC fleet with provision-snd --role=rpc on a kind: SeiNode template — but provision-snd creates a SeiNetwork (strict-unmarshal), so the RPC step render-fails the first time the nightly cron fetches these scenarios → load + release dead on arrival.

Change

Migrate the RPC provision step in both scenarios provision-sndprovision-node (standalone SeiNode followers), mirroring the in-repo chaos scenarios already migrated in #1165:

  • --replicas (load-test=2 — seiload drives all N via RPC_EVM_RPC_LIST; release-test=1 — mocha hits a single RPC_TM_RPC/EVM/REST)
  • --network=$SEI_CHAIN_ID — genesis peer auto-wire (sei.io/seinetwork=<chain>)
  • --running-timeout (was --ready-timeout; SeiNode has no Ready phase)

The validator steps intentionally stay provision-snd (genesis SeiNetwork). Also refreshed two stale RPC-publishing comments.

Note

provision-node exists in the pinned seitask image (adca2d5), so no image bump needed; platform #1191 bumps SCENARIO_REF to this commit. Validated YAML.

🤖 Generated with Claude Code

…-snd

The load-test/release-test scenarios still provisioned the RPC fleet with
`provision-snd --role=rpc` on a kind: SeiNode template — but provision-snd was
repurposed to create a SeiNetwork (strict-unmarshal), so the RPC step render-fails
the moment the nightly fetches these scenarios. Migrate the RPC step to
provision-node (standalone SeiNode followers), matching the in-repo chaos
scenarios: --replicas (load=2 for the seiload fleet, release=1 for mocha's single
RPC) + --network for genesis peer auto-wire + --running-timeout (SeiNode has no
Ready phase). Validator steps stay provision-snd (genesis SeiNetwork). Refresh two
stale RPC-publishing comments (provision-snd -> provision-node; single-node release).

Coral xreview (sei-network-specialist) blocker on the WS-C nightly migration.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 22, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches core nightly chaos workflow provisioning for load and release validation; misconfiguration could still break benchmarks, but the change corrects an already-failing path and mirrors in-repo patterns.

Overview
Fixes load-test and release-test nightly workflows that failed at the RPC step because provision-snd only provisions SeiNetwork resources while the RPC templates are SeiNode followers.

The provision-rpc-fleet task now calls provision-node with --replicas (2 for load-test, 1 for release-test), --network=$SEI_CHAIN_ID for genesis peer wiring, and --running-timeout instead of --ready-timeout. Validator provisioning stays on provision-snd.

Task deadlines are raised (42m / 32m) with inline budgeting for sequential catch-up and EVM readiness plus Chaos Mesh scheduling/image-pull slack. Comments are updated for provision-node endpoint publishing, the redundant wait-rpc-caught-up gate, and single-node RPC routing in release-test.

Reviewed by Cursor Bugbot for commit 8956eb2. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d29f3d3. Configure here.

Comment thread scenarios/load-test.yaml
…deadline budget

Coral /xreview (sei-network dissenter + platform-engineer + prose-steward):
- load-test wait-rpc-caught-up comment was stale + self-contradictory (claimed
  SeiNetwork-Ready / aggregate-load-balanced RPC_TM_RPC / provision-snd parser);
  under provision-node, RPC_TM_RPC is node-0 (not an aggregate) and provision-node
  already gates all followers caught-up before publishing, so the step is a
  redundant re-confirm. Rewrote the comment to say so honestly.
- bump provision-rpc-fleet Task deadline (load 25m->40m, release 25m->30m) to
  cover worst-case sequential readiness (running-timeout 18m + N×2×first-block 5m),
  so provision-node's typed exit surfaces before a Chaos-Mesh deadline kill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

/xreview ledger (Class: cross-component, T2)

Slate requested: systems-engineer (esp.), sei-network-specialist (dissenter), platform-engineer, prose-steward.
Ran: sei-network-specialist ✅, platform-engineer ✅, prose-steward ✅. systems-engineer ❌ blocked (account out of usage credits, resets Jun 30).

Boundary Lens Verdict Resolution
key-name contract (RPC_* vars) sei-network, platform COMPATIBLE provision-node publishes RPC_EVM_RPC_LIST/RPC_EVM_RPC/RPC_TM_RPC/RPC_REST/CHAIN_ID byte-identical to provision-snd
peer-wiring (--network) sei-network COMPATIBLE synthesized sei.io/seinetwork=<chain> LabelPeerSource resolves; followers genuinely sync
replica counts (load=2, release=1) sei-network, platform COMPATIBLE load feeds EVM_RPC_LIST; release collapses to one node for stateful mocha
flag contract @ adca2d5 platform COMPATIBLE all flags exist; --ready-timeout--running-timeout rename mandatory
workflow-vars CM name/ns + ownerRef cascade platform COMPATIBLE workflow-vars-<wf> matches downstream envFrom; ownerRef intact
wait-rpc-caught-up tautology / stale "aggregate" comment sei-network (correctness), prose (correctness), platform (advisory) RESOLVED RPC_TM_RPC is node-0 (not aggregate); provision-node gates caught-up inline → comment rewritten to label the step an honest redundant re-confirm (2be6a6c)
Task deadline 25m < worst-case readiness budget sei-network (correctness), platform (advisory) RESOLVED bump provision-rpc-fleet deadline (load→40m, release→30m) past 18m + N×2×5m so typed exit beats the Chaos-Mesh kill (2be6a6c)

Prose addendum (advisory, deferred): SND-nomenclature in both file headers + provision-snd in release-test's primitives list + the was --ready-timeout history breadcrumb — accurate-enough divergences, worth a sweep but non-gating.

State: OPEN — systems-engineer lens (operator-requested) pending on credit reset. Dissenter ran and its correctness findings are resolved; platform COMPATIBLE; prose correctness resolved. Holding merge for the systems pass per operator request (or operator accept-with-risk: sei-network + platform independently covered the systems brief — the readiness/budget/reliability boundary).

…duling

systems-engineer xreview: the 2m outer slack (40/38, 30/28) must also
absorb pod scheduling + cold $SEITASK_IMAGE pull, which sit outside the
inner readiness budget. A cold pull could trigger the opaque deadline-kill
the budget exists to prevent. Bump to 42m/32m (4m headroom).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

/xreview ledger — UPDATED (systems-engineer lens closed)

The previously credit-blocked systems-engineer lens has now run. State: RESOLVED.

Boundary Lens Verdict Resolution
deadline-vs-budget math + typed-exit-beats-kill invariant systems-engineer COMPATIBLE budget verified against provision.go: waits are strictly sequential (one shared 18m waitForRunning + a serial per-node loop of WaitCaughtUp→WaitEVMServing, 5m each), so N=2→38m / N=1→28m is the true worst-case wall clock. Invariant holds.
sequential-vs-concurrent (N× multiplier) systems-engineer COMPATIBLE confirmed serial for loop, no goroutines; 18m is a flat shared term, only the readiness probes are ×N — comment is accurate.
provision-snd→provision-node failure modes systems-engineer COMPATIBLE no regression; stronger semantics — no silent partial-fleet, first-non-ready-node fails the whole Task with the node name, idempotent on Task restart (AlreadyExists tolerated, fresh outer deadline per attempt).
failure observability systems-engineer COMPATIBLE (net improvement) typed, node-localized signal (SeiNode name + gate /status caught-up vs eth_blockNumber; Infra vs Task exit class) — strictly better than provision-snd's aggregate-fleet wait.
2m outer slack vs pod-scheduling + cold image-pull systems-engineer (advisory) RESOLVED the Chaos-Mesh deadline clock starts at Task admission, so the slack must also absorb scheduling + a cold $SEITASK_IMAGE pull (outside the inner budget). Bumped deadline 40m→42m (load) / 30m→32m (release) for 4m headroom (8956eb2).

All 4 requested lenses have now run (sei-network dissenter ✅, platform ✅, prose ✅, systems-engineer ✅); all correctness findings resolved; the one systems advisory hardened rather than accepted. Merge gate clear pending CI re-green on 8956eb2.

@bdchatham bdchatham merged commit 1cc206c into main Jun 22, 2026
5 checks passed
@bdchatham bdchatham deleted the fix/scenarios-rpc-provision-node branch June 22, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant