Skip to content

feat(test): TestBenchmark drives seiload (WS-I load suite)#430

Merged
bdchatham merged 2 commits into
mainfrom
feat/wsi-seiload-drive
Jun 23, 2026
Merged

feat(test): TestBenchmark drives seiload (WS-I load suite)#430
bdchatham merged 2 commits into
mainfrom
feat/wsi-seiload-drive

Conversation

@bdchatham

Copy link
Copy Markdown
Collaborator

WS-I — complete the load suite (TestBenchmark)

Builds on the merged foundation (#428) + SDK Labels/DeletionPolicy (#429). After provisioning the chain + RPC fleet, TestBenchmark now drives seiload and asserts the chain survived the load.

Flow

  1. Render the platform seiload profile (read from the seiload-profiles ConfigMap) with the per-run chain id + the fleet's EVM endpoints; write a per-run profile CM stamped sei.io/harness-run.
  2. Apply seiload's own Job manifest (embedded template, parameterized — its Job spec is not constructed in Go; D3 decoupling) carrying the metrics scrape label + harness-run.
  3. Wait for the Job to run the full load to completion.
  4. Assert the chain stayed live (post-load WaitCaughtUp on node-0).

Assert model (deliberate)

Pass/fail = Job completion + post-load chain liveness. A throughput/regression gate belongs in telemetry — a PromQL query over the run's metrics (the conventional path, like validate-release reads Grafana) — not parsed from the report. Left as a documented follow-up; the Job already carries the metrics scrape label so it's a drop-in.

Decoupling / cleanup

  • seiload runs from its own manifest; the profile is platform-owned (read from the cluster CM, not vendored).
  • Per-run profile CM + seiload Job carry sei.io/harness-run; t.Cleanup deletes them on normal exit, the label-GC sweep backstops abnormal exit.

Verification

gofmt clean · go build ./... clean (no integration code in prod deps) · go vet -tags integration clean · golangci-lint 0 issues · go test -c -tags integrationTestBenchmark · skips without SEI_NODE_CLUSTER.

Next

Live smoke on harbor (demonstration) + platform PR to point the load CronJob at the new image and update the podMonitor selector to sei.io/harness-run.

🤖 Generated with Claude Code

Completes the load suite: after provisioning the chain + RPC fleet, render
the platform seiload profile (from the seiload-profiles ConfigMap) with the
fleet's EVM endpoints, apply seiload's own Job manifest as a decoupled unit,
wait for it to run the full load, and assert the chain stayed live under it.

- seiload runs from its own manifest (embedded template, parameterized) — its
  Job spec is not constructed in Go. Profile is read from the platform CM, not
  vendored.
- Pass/fail = Job completion + post-load chain liveness. A throughput/regression
  gate belongs in telemetry (a PromQL query over the run's metrics); the Job
  carries the metrics scrape label so that gate can be added later.
- Per-run profile CM + seiload Job carry sei.io/harness-run for the GC sweep;
  t.Cleanup deletes them on normal exit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 23, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Creates real cluster Jobs/ConfigMaps and long-running load against nightly infrastructure; failures are mostly test-scoped, but misconfiguration could leave labeled resources until cleanup or future GC.

Overview
TestBenchmark now runs the full load path instead of skipping after provision: it provisions the chain + RPC fleet, then invokes runSeiload to apply load and assert liveness.

The harness renders a platform-owned profile from the cluster seiload-profiles ConfigMap (chain id + JSON-quoted EVM RPC URLs), writes a per-run profile ConfigMap labeled sei.io/harness-run, and applies a parameterized seiload Batch Job from an embedded YAML template (not built in Go). It waits for Job success (with pod log tail on failure) and checks every RPC follower is still caught up after load. Pass/fail is Job completion + post-load sync; throughput gates are explicitly deferred to PromQL/metrics.

spec and env gain SEILOAD_IMAGE, profile, duration, and commit inputs (envInt for DURATION_MINUTES). Cleanup uses t.Cleanup for the Job and profile CM; harness comments note label-GC sweep is still a pending platform backstop.

Reviewed by Cursor Bugbot for commit 557897e. Bugbot is set up for automated code reviews on this repo. Configure here.

- systems: capture the failed seiload pod log into the fatal (the failure-time
  signal a Job condition message can't give); add a self-terminating Job
  activeDeadlineSeconds independent of the harness ctx.
- sei-network: widen the post-load liveness check to every follower, not just
  node-0 (a half-dead fleet would otherwise ship green).
- k8s/dissenter + comment-register: soften the runLabelKey comment to stop
  claiming a label-GC sweep that isn't shipped yet (pending platform
  deliverable; DeletionPolicy cascade + t.Cleanup cover normal exit); use the
  chain's resolved namespace directly for the seiload Job (no env re-resolve);
  fix the stale namespace field comment.
- idiom: trim the runSeiload doc to present-state.

chainId 713714 confirmed correct (seid GetEVMChainID falls through to
DefaultChainID for bench-* chains). RBAC Role + podMonitor selector flip +
gc label sweep are platform prereqs, tracked separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

/xreview — 4 lenses (idiom · systems · k8s-dissenter · sei-network), RESOLVED

Comprehensive blinded review; spine ratified, fixes applied in the latest commit.

# Finding Lens Grade Resolution
F1 Failed seiload pod log discarded — Job condition message alone can't diagnose why systems advisory (3am-critical) waitJob now tails the pod log into the fatal
F2 Job has no activeDeadlineSeconds — only the harness ctx bounds a hung seiload systems correctness (defense-in-depth) self-terminating cap (durationMin+15)*60
F3 Post-load liveness checked only node-0 — a half-dead fleet ships green sei-network advisory widened to every follower
F4 runLabelKey comment claims a label-GC sweep that isn't shipped dissenter + idiom correctness (honesty) softened — sweep is a pending platform deliverable; cascade + t.Cleanup cover normal exit
F5 namespace re-resolved from env in runSeiload dissenter advisory use the chain's resolved namespace directly
F6 runSeiload doc narrates rationale/future-plan idiom style trimmed to present-state
chainId 713714 sei-network COMPATIBLE verified: seid GetEVMChainIDDefaultChainID for bench-* chains
CM key/mount, namespace non-divergence, pod-template GC label, DeletionPolicy, cleanup ordering all COMPATIBLE conceded by the dissenter

Platform prereqs (tracked separately, NOT this PR): a harness-SA Role (configmaps/jobs/seinetworks create), the podMonitor selector flip to app.kubernetes.io/name=seiload, and the sei.io/harness-run gc-cronjob label sweep — the platform cutover + Step-2 deliverables.

Live smoke of TestBenchmark on harbor in progress (the demonstration).

@bdchatham bdchatham merged commit 5bf74c0 into main Jun 23, 2026
5 checks passed
@bdchatham bdchatham deleted the feat/wsi-seiload-drive branch June 23, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant