feat(test): TestBenchmark drives seiload (WS-I load suite) by bdchatham · Pull Request #430 · sei-protocol/sei-k8s-controller

bdchatham · 2026-06-23T00:53:32Z

WS-I — complete the load suite (TestBenchmark)

Builds on the merged foundation (#428) + SDK Labels/DeletionPolicy (#429). After provisioning the chain + RPC fleet, TestBenchmark now drives seiload and asserts the chain survived the load.

Flow

Render the platform seiload profile (read from the seiload-profiles ConfigMap) with the per-run chain id + the fleet's EVM endpoints; write a per-run profile CM stamped sei.io/harness-run.
Apply seiload's own Job manifest (embedded template, parameterized — its Job spec is not constructed in Go; D3 decoupling) carrying the metrics scrape label + harness-run.
Wait for the Job to run the full load to completion.
Assert the chain stayed live (post-load WaitCaughtUp on node-0).

Assert model (deliberate)

Pass/fail = Job completion + post-load chain liveness. A throughput/regression gate belongs in telemetry — a PromQL query over the run's metrics (the conventional path, like validate-release reads Grafana) — not parsed from the report. Left as a documented follow-up; the Job already carries the metrics scrape label so it's a drop-in.

Decoupling / cleanup

seiload runs from its own manifest; the profile is platform-owned (read from the cluster CM, not vendored).
Per-run profile CM + seiload Job carry sei.io/harness-run; t.Cleanup deletes them on normal exit, the label-GC sweep backstops abnormal exit.

Verification

gofmt clean · go build ./... clean (no integration code in prod deps) · go vet -tags integration clean · golangci-lint 0 issues · go test -c -tags integration → TestBenchmark · skips without SEI_NODE_CLUSTER.

Completes the load suite: after provisioning the chain + RPC fleet, render the platform seiload profile (from the seiload-profiles ConfigMap) with the fleet's EVM endpoints, apply seiload's own Job manifest as a decoupled unit, wait for it to run the full load, and assert the chain stayed live under it. - seiload runs from its own manifest (embedded template, parameterized) — its Job spec is not constructed in Go. Profile is read from the platform CM, not vendored. - Pass/fail = Job completion + post-load chain liveness. A throughput/regression gate belongs in telemetry (a PromQL query over the run's metrics); the Job carries the metrics scrape label so that gate can be added later. - Per-run profile CM + seiload Job carry sei.io/harness-run for the GC sweep; t.Cleanup deletes them on normal exit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor · 2026-06-23T00:53:40Z

PR Summary

Medium Risk
Creates real cluster Jobs/ConfigMaps and long-running load against nightly infrastructure; failures are mostly test-scoped, but misconfiguration could leave labeled resources until cleanup or future GC.

Overview
TestBenchmark now runs the full load path instead of skipping after provision: it provisions the chain + RPC fleet, then invokes runSeiload to apply load and assert liveness.

The harness renders a platform-owned profile from the cluster seiload-profiles ConfigMap (chain id + JSON-quoted EVM RPC URLs), writes a per-run profile ConfigMap labeled sei.io/harness-run, and applies a parameterized seiload Batch Job from an embedded YAML template (not built in Go). It waits for Job success (with pod log tail on failure) and checks every RPC follower is still caught up after load. Pass/fail is Job completion + post-load sync; throughput gates are explicitly deferred to PromQL/metrics.

spec and env gain SEILOAD_IMAGE, profile, duration, and commit inputs (envInt for DURATION_MINUTES). Cleanup uses t.Cleanup for the Job and profile CM; harness comments note label-GC sweep is still a pending platform backstop.

^{Reviewed by Cursor Bugbot for commit 557897e. Bugbot is set up for automated code reviews on this repo. Configure here.}

- systems: capture the failed seiload pod log into the fatal (the failure-time signal a Job condition message can't give); add a self-terminating Job activeDeadlineSeconds independent of the harness ctx. - sei-network: widen the post-load liveness check to every follower, not just node-0 (a half-dead fleet would otherwise ship green). - k8s/dissenter + comment-register: soften the runLabelKey comment to stop claiming a label-GC sweep that isn't shipped yet (pending platform deliverable; DeletionPolicy cascade + t.Cleanup cover normal exit); use the chain's resolved namespace directly for the seiload Job (no env re-resolve); fix the stale namespace field comment. - idiom: trim the runSeiload doc to present-state. chainId 713714 confirmed correct (seid GetEVMChainID falls through to DefaultChainID for bench-* chains). RBAC Role + podMonitor selector flip + gc label sweep are platform prereqs, tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

bdchatham · 2026-06-23T01:04:15Z

/xreview — 4 lenses (idiom · systems · k8s-dissenter · sei-network), RESOLVED

Comprehensive blinded review; spine ratified, fixes applied in the latest commit.

#	Finding	Lens	Grade	Resolution
F1	Failed seiload pod log discarded — Job condition message alone can't diagnose why	systems	advisory (3am-critical)	`waitJob` now tails the pod log into the fatal
F2	Job has no `activeDeadlineSeconds` — only the harness ctx bounds a hung seiload	systems	correctness (defense-in-depth)	self-terminating cap `(durationMin+15)*60`
F3	Post-load liveness checked only node-0 — a half-dead fleet ships green	sei-network	advisory	widened to every follower
F4	`runLabelKey` comment claims a label-GC sweep that isn't shipped	dissenter + idiom	correctness (honesty)	softened — sweep is a pending platform deliverable; cascade + t.Cleanup cover normal exit
F5	namespace re-resolved from env in runSeiload	dissenter	advisory	use the chain's resolved namespace directly
F6	runSeiload doc narrates rationale/future-plan	idiom	style	trimmed to present-state
—	chainId 713714	sei-network	COMPATIBLE	verified: seid `GetEVMChainID` → `DefaultChainID` for bench-* chains
—	CM key/mount, namespace non-divergence, pod-template GC label, DeletionPolicy, cleanup ordering	all	COMPATIBLE	conceded by the dissenter

Platform prereqs (tracked separately, NOT this PR): a harness-SA Role (configmaps/jobs/seinetworks create), the podMonitor selector flip to app.kubernetes.io/name=seiload, and the sei.io/harness-run gc-cronjob label sweep — the platform cutover + Step-2 deliverables.

Live smoke of TestBenchmark on harbor in progress (the demonstration).

bdchatham merged commit 5bf74c0 into main Jun 23, 2026
5 checks passed

bdchatham deleted the feat/wsi-seiload-drive branch June 23, 2026 01:37

This was referenced Jun 23, 2026

fix(test): provision chains with working seid config (smoke-proven) #431

Merged

feat(test): TestChaosSuite — chaos faults via the harness (network-partition) #432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(test): TestBenchmark drives seiload (WS-I load suite)#430

feat(test): TestBenchmark drives seiload (WS-I load suite)#430
bdchatham merged 2 commits into
mainfrom
feat/wsi-seiload-drive

bdchatham commented Jun 23, 2026

Uh oh!

cursor Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

bdchatham commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Jun 23, 2026

WS-I — complete the load suite (TestBenchmark)

Flow

Assert model (deliberate)

Decoupling / cleanup

Verification

Next

Uh oh!

cursor Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

bdchatham commented Jun 23, 2026

/xreview — 4 lenses (idiom · systems · k8s-dissenter · sei-network), RESOLVED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 23, 2026 •

edited

Loading