Skip to content

feat(test): WS-I sei integration suite — TestBenchmark provision spine [DRAFT]#428

Merged
bdchatham merged 3 commits into
mainfrom
feat/wsi-sei-integration-suite-load
Jun 23, 2026
Merged

feat(test): WS-I sei integration suite — TestBenchmark provision spine [DRAFT]#428
bdchatham merged 3 commits into
mainfrom
feat/wsi-sei-integration-suite-load

Conversation

@bdchatham

Copy link
Copy Markdown
Collaborator

WS-I Step 1 (foundation) — Go-native nightly harness as go test targets

First increment of the Go-native test harness that replaces the Chaos-Mesh Workflow DAG + seitask Task pods + workflow-vars ConfigMap. Draft — lands the architecture + provisioning spine; the seiload drive + S3 report is the next increment (marked TODO + t.Skip).

What's here

  • test/integration/harness_test.go — shared machinery: spec/chain (local Go state, replacing workflow-vars), provision (SDK in-process: CreateNetwork 4 validators + N RPC SeiNodes, each waited Running→caught-up→EVM-serving), teardown via t.Cleanup, sei.io/harness-run label const, env gate.
  • test/integration/benchmark_test.goTestBenchmark (load suite).

Architecture decisions (WS-I LLD)

  • D3 Go orchestrator, SDK in-process; seiload + chaos are decoupled units the suite applies, not constructs.
  • D4 plain go test targets in _test.go (not a CLI binary) — Go's _test.go rule guarantees zero test code in any production binary; //go:build integration hides it from default CI. Verified: go build ./... clean, no integration code in cmd/ deps.
  • D5 in-cluster CronJob vehicle; one go test -c image, one CronJob per target (-test.run TestX). Targets: TestBenchmark / TestChaosSuite / TestChainUpgrade / TestRelease.
  • Stages wholesale seitask-runner removal: imports only sdk/sei (+ k8s provider).

Verification

gofmt clean · go build ./... clean · go vet -tags integration ./test/integration clean · go test -c -tags integration → binary exposes TestBenchmark · skips cleanly without SEI_NODE_CLUSTER.

Next increment

seiload as a decoupled unit (apply its own manifest w/ evmEndpoints(), stamped harness-run) → wait → read S3 report → assert TPS. Then mark ready + Coral/Bugbot/CI gate.

🤖 Generated with Claude Code

bdchatham and others added 2 commits June 22, 2026 16:37
…n spine

Go-native nightly harness as plain `go test` targets (WS-I). Stage 1: the
TestBenchmark load suite's provisioning spine — CreateNetwork (4 validators)
+ N standalone RPC SeiNodes via the sei SDK in-process, each waited to
Running + caught-up + EVM-serving, torn down via t.Cleanup.

Conventions / isolation:
- test/integration/*_test.go, //go:build integration: Go's _test.go rule
  guarantees zero test code links into any production binary (controller /
  seid / seitask); the build tag hides it from default `go test ./...`.
- ships only via `go test -c -tags integration` as a standalone binary in
  its own image, run by one in-cluster CronJob per target (-test.run TestX).
- imports ONLY sdk/sei (+ k8s provider) — no internal/seitask or
  internal/taskruntime — so the seitask runner deletes wholesale once the
  four targets (Benchmark/ChaosSuite/ChainUpgrade/Release) land.

seiload + chaos are decoupled units the suite will APPLY, not construct
(seiload from its own manifest; chaos from platform-owned fault CRs). The
seiload drive + S3 report is the next increment (marked TODO + t.Skip).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…model hardening

Resolves the 3-lens xreview findings on the integration-suite foundation:

- F1 (correctness, all 3 lenses): the sei.io/harness-run GC label was
  declared but never stamped (SDK specs had no Labels field; renderNetwork
  stamped none) — the sole abnormal-exit reaper selected nothing. Add
  Labels to NetworkSpec/NodeSpec, thread into renderNetwork (was unlabeled)
  + renderNode (caller labels merge UNDER the canonical role/seinetwork,
  which win on collision); provision stamps runLabelKey=runID on the network
  + every node. Locked by render_test label assertions.
- F2 (correctness): a -test.timeout breach panics and bypasses t.Cleanup;
  derive ctx via signal.NotifyContext(SIGTERM) so the activeDeadlineSeconds
  grace period triggers teardown before SIGKILL. (-test.timeout 0 CronJob
  requirement recorded in the LLD.)
- F5 (systems): per-gate t.Logf progress so a stall is localizable in real
  time (which node, which gate) instead of one terminal error.
- idiom advisories: env->envOr, for-range, Skipf naming what was provisioned,
  honest runLabelKey comment.

F3/F4 (-test.timeout default, -test.run no-match false-green) are recorded
as CronJob/CI run-model requirements (LLD) — enforced with the platform
wiring. SDK Labels is a Brandon-approved one-way-door addition.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

/xreview round-1 — RESOLVED (3 blinded lenses: idiom · systems · k8s-dissenter)

Dispatched iterative review on the foundation before more accretes. The dissenter did not ratify on first pass; all findings now resolved in 3997e2c.

# Finding Grade Resolution
F1 sei.io/harness-run GC label declared but never stamped — SDK specs had no Labels field, renderNetwork stamped none, so the sole abnormal-exit reaper selected nothing (3-lens convergence) correctness Added Labels to NetworkSpec/NodeSpec, threaded into both renderers (network was unlabeled; node merges caller labels UNDER canonical role/seinetwork). provision stamps it on network + every node. Locked by render tests.
F2 -test.timeout breach panics → t.Cleanup bypassed (verified) correctness ctx via signal.NotifyContext(SIGTERM) so the activeDeadlineSeconds grace triggers teardown; -test.timeout 0 recorded as a CronJob requirement
F3/F4 default 10m timeout truncates 90m suite; -test.run no-match exits 0 = false green (verified) correctness recorded as CronJob/CI run-model requirements (LLD) — enforced with the platform wiring
F5 no per-gate progress log systems per-gate t.Logf (running→caught-up→EVM serving)
split idiom vetted _test.go-as-lib correct; dissenter flagged LLD contradiction _test.go retained (only the suffix gives Go-enforced no-prod-leak); stale LLD cmd-binary/internal/-pkg refs scrubbed
idiom env→envOr, for-range, Skipf, honest label comment style applied

Ratified by all three: partial-provision teardown (append-before-wait), probe client, sequential provisioning, SDK consumption. SDK Labels is a Brandon-approved one-way-door. Ledger: crd-migration-ws/design/xreview/wsi-step1-testbenchmark.md. Still draft pending the seiload drive increment (→ Round 2 with the CronJob run-model enforcement).

@bdchatham bdchatham marked this pull request as ready for review June 22, 2026 23:53
@cursor

cursor Bot commented Jun 22, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Label merge rules affect how SeiNetwork/SeiNode CRs are created (peer/chaos selectors must stay correct); integration tests only run when SEI_NODE_CLUSTER is set and currently skip after provision.

Overview
Adds Labels on NetworkSpec / NodeSpec and threads them through k8s rendering: SeiNetwork gets caller labels only (still no sei.io/seinetwork); SeiNode merges caller labels then overwrites with canonical sei.io/role and sei.io/seinetwork on key conflicts. Render tests cover harness-run labels and collision behavior.

Introduces test/integration (//go:build integration): shared harness (provision / teardown, sei.io/harness-run stamping, env gates) and TestBenchmark, which provisions a 4-validator network + 2 RPC nodes via the SDK in-process, handles SIGTERM for cleanup, then t.Skip until seiload is wired.

Reviewed by Cursor Bugbot for commit bd15399. Bugbot is set up for automated code reviews on this repo. Configure here.

if ch.network != nil {
if err := ch.network.Delete(ctx); err != nil {
errs = append(errs, fmt.Errorf("delete network %q: %w", ch.network.Name(), err))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teardown orphans validator SeiNodes

High Severity

provision creates a genesis SeiNetwork without deletionPolicy: Delete, and teardown only deletes RPC SeiNodes plus that network. With the API default Retain, deleting the network orphans its validator children instead of removing them. Those controller-created validators never get sei.io/harness-run, so neither normal t.Cleanup nor the documented label-GC sweep reaps them, leaking CRs and workloads in the shared nightly namespace on every run.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3997e2c. Configure here.

lint flagged "run-xyz" at 4 occurrences in render_test. Extract
testRunLabel/testRunID constants alongside the other k8s test fixtures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham

Copy link
Copy Markdown
Collaborator Author

bugbot run

@bdchatham bdchatham merged commit 415ef5b into main Jun 23, 2026
5 checks passed
@bdchatham bdchatham deleted the feat/wsi-sei-integration-suite-load branch June 23, 2026 00:05

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

1 issue from previous review remains unresolved.

Fix All in Cursor

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit bd15399. Configure here.

bdchatham added a commit that referenced this pull request Jun 23, 2026
…-sev follow-up to #428) (#429)

* fix(sdk,test): cascade-delete ephemeral chain validators (Bugbot high-sev)

Bugbot (PR #428) caught a real leak: provision creates the genesis
SeiNetwork, and teardown deletes that network — but the CRD defaults
DeletionPolicy=Retain, and the controller strips the validator children's
ownerRef under Retain (removeOwnerRef), so deleting the network ORPHANS the
controller-created validator SeiNodes. Those validators never carry
sei.io/harness-run (the harness doesn't create them), so neither t.Cleanup
nor the label-GC sweep reaps them — leaking 4 validators + PVCs per run in
the shared nightly namespace.

Add DeletionPolicy to the SDK NetworkSpec (string + DeletionDelete/Retain
constants, stdlib-only core; threaded into renderNetwork). The integration
harness sets DeletionDelete so an ephemeral chain cascade-deletes its
validators (+ PVCs) on teardown. Locked by a render_test assertion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(test,sdk): tighten comment register per expert review

Two-lens comment-standards review (idiom D10 + prose dual-audience):
- strip meta/ID cruft from code comments (review-tool + design-step/decision
  IDs belong in the PR, not source)
- drop migration-history framing (present-state only)
- fix one drift: TestBenchmark doc claimed seiload drive/report the body skips
- trim the agent-verbose package doc + de-duplicate the DeletionPolicy
  rationale to a single canonical home

No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant