Skip to content

tool: add full-scan EVM logical digest#3611

Merged
blindchaser merged 6 commits into
mainfrom
yiren/flatkv-full-scan
Jun 23, 2026
Merged

tool: add full-scan EVM logical digest#3611
blindchaser merged 6 commits into
mainfrom
yiren/flatkv-full-scan

Conversation

@blindchaser

Copy link
Copy Markdown
Contributor

Summary

Add an evm-logical-digest seidb operation for comparing EVM state across FlatKV and memIAVL at the same height. The command normalizes both backends into FlatKV physical keys, strips height-dependent value metadata, reports per-bucket bucket_digest values, and emits one FINAL_DIGEST line for backend comparison.

  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Adds the evm-logical-digest command with FlatKV native scanning and memIAVL snapshot scanning. FlatKV reads use RawGlobalIterator; memIAVL reads stream snapshot kvs records sequentially so scan order does not affect correctness.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Enforces an order-independent bucket accumulator over sha256(len(key)||key||len(value)||value). The final digest combines account, code, storage, and marker-adjusted legacy bucket digests so a FlatKV-only migration-version row does not create a false mismatch.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Supports semantic memIAVL normalization by default and an opt-in translator mode through --memiavl-normalization translator. Semantic mode decodes raw EVM leaves directly; translator mode routes leaves through flatkv.ImportTranslator to validate the migration mapping.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Adds --inspect-bucket, prefix sharding, row listing, backend metadata details, and --find-hash support for isolating mismatched entries. memIAVL inspect honors the same normalization flag as the global digest, so diagnostics match the selected digest path.

Test plan

  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestSemanticMemiavlDigestMatchesTranslatorForCoreEVMKeys verifies semantic normalization matches translator normalization for account, code, storage, and legacy buckets, including delete-equivalent zero storage and empty code rows.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestSemanticMemiavlInspectMatchesTranslatorForCoreEVMKeys verifies inspect bucket results match translator output for all normalized buckets.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestInspectMemiavlRejectsUnknownNormalizationBeforeOpeningSnapshot guarantees invalid --memiavl-normalization values are rejected before filesystem access.
  • Manual verification: go test ./sei-db/tools/cmd/seidb/operations.

Add an `evm-logical-digest` seidb operation for comparing EVM state across FlatKV and memIAVL at the same height. The command normalizes both backends into FlatKV physical keys, strips height-dependent value metadata, reports per-bucket `bucket_digest` values, and emits one `FINAL_DIGEST` line for backend comparison.
@cursor

cursor Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Read-only offline tooling under sei-db/tools with no node runtime or consensus path changes; risk is limited to operator misuse of large full scans on production data dirs.

Overview
Adds seidb evm-logical-digest, wired into the root CLI, so operators can compare EVM state between FlatKV and memIAVL at the same height without byte-for-byte physical dumps.

The command normalizes both backends to FlatKV-style physical keys, strips height-dependent value headers, and builds order-independent per-bucket digests (XOR of per-entry sha256(len||key||len||val)), then prints FINAL_DIGEST for account, code, storage, and legacy. FlatKV scans use RawGlobalIterator (via existing read-only open); memIAVL scans stream snapshot kvs sequentially instead of walking the mmap tree. Semantic memIAVL mode (default) decodes raw EVM leaves locally; translator mode routes leaves through flatkv.ImportTranslator to validate migration mapping.

Diagnostics include --inspect-bucket (prefix filter, sharding, list/details), --find-hash to locate a single diverging row, and legacy-bucket handling that XORs out the FlatKV-only migration/migration-version marker so memiavl-only nodes compare cleanly. Unit tests lock semantic vs translator parity for core EVM keys and key-filter behavior.

Reviewed by Cursor Bugbot for commit 3855d28. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 22, 2026, 6:43 PM

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 24.27984% with 552 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.98%. Comparing base (4d449ae) to head (3855d28).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...b/tools/cmd/seidb/operations/evm_logical_digest.go 24.34% 526 Missing and 24 partials ⚠️
sei-db/tools/cmd/seidb/main.go 0.00% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (24.27%) is below the target coverage (50.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3611      +/-   ##
==========================================
- Coverage   59.01%   57.98%   -1.04%     
==========================================
  Files        2224     2151      -73     
  Lines      182699   174851    -7848     
==========================================
- Hits       107823   101388    -6435     
+ Misses      65197    64454     -743     
+ Partials     9679     9009     -670     
Flag Coverage Δ
sei-chain-pr 24.72% <24.27%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/tools/cmd/seidb/main.go 0.00% <0.00%> (ø)
...b/tools/cmd/seidb/operations/evm_logical_digest.go 24.34% <24.34%> (ø)

... and 104 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread sei-db/tools/cmd/seidb/operations/evm_logical_digest.go

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7bb0fc7a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sei-db/tools/cmd/seidb/operations/evm_logical_digest.go
Comment thread sei-db/tools/cmd/seidb/operations/evm_logical_digest.go
bdchatham added a commit to sei-protocol/seictl that referenced this pull request Jun 20, 2026
…sk panic isolation (#211)

Adds the **full-keyspace digest gate**
([sei-chain#3611](sei-protocol/sei-chain#3611
`seidb evm-logical-digest`) as a discrete sidecar task — the per-segment
boundary seal that closes the touched-key comparator's **cold-state
blind spot** (a key migrated wrong and never touched again is invisible
to per-block Layer 2). Plus three seams the systems-engineering review
called for. No "ShadowResultProducer" abstraction — that's deferred to
the 3rd producer (YAGNI).

### What's here
- **`sidecar/s3/emit.go`** — one S3 emission helper
(`StreamGzipNDJSON`/`StreamGzipJSON`/`StreamGzipFunc`), collapsing 3
duplicated gzip-pipe paths. Twofold integrity seal: an aws-chunked
SHA-256 **wire** checksum over the compressed body (io.Pipe
streaming/backpressure preserved) + an **uncompressed-payload** SHA-256
surfaced via `EmitResult` for out-of-band verification.
`result_compare`/`result_export` refactored onto it (no behavior
change).
- **`sidecar/engine`** — `recover()` in `runTask` turns a handler panic
into a failed `TaskResult` (+ `seictl_task_panics_total`) instead of
crashing the sidecar.
- **`sidecar/tasks/evm_logical_digest.go`** — the discrete task: shells
out to `seidb` for flatkv + memiavl (`semantic` + `translator`), asserts
**both** backends' opened version `== height` (fail-closed — no
wrong-height false match), parses the `FINAL_DIGEST`/per-bucket
contract, publishes an `EndpointDigestRecord`. `axes_proved`
deliberately omits **balance** (the semantic account digest zeroes it —
that axis stays the per-block comparator's job).

### Cross-review (systems-engineer + idiomatic-reviewer) — applied
- **Symmetric memiavl version assertion** (the flatkv-only check left a
wrong-height false-match hole if seidb clamps to the nearest snapshot).
- **`recover()` inside the s3 writer goroutine** — a panic there (e.g.
`MarshalJSON` over chain data) runs on a task-spawned goroutine
*outside* the engine's handler recover; converted to a returned error so
the upload aborts (no truncated-but-valid object) and the process
survives.
- **Dropped the empty-by-construction `uncompressed_sha256`** from the
published record (a record can't carry the hash of its own bytes; the
seal is out-of-band in the log/TaskResult).
- comment-precision fixes (memory bound is the uploader part-pool, not
"gzip window"; S3 checksum is per-part-composite for multipart) + a
`version:`-line length guard.

### Notes
- `seidb` is **shelled out to** (configurable `seidbPath`), not
vendored. #3611 also needs a one-line registration fix
(`EvmLogicalDigestCmd()` isn't in `seidb`'s root `AddCommand`) — flagged
to the author.
- Trigger is the **out-of-band task API**; no controller/CRD change
(consistent with the avoided `ResultExportConfig` one-way door).
- **One-way-door surfaces for confirmation before a consumer reads
them:** the task-type string `"evm-logical-digest"`, the param field
names, and the `EndpointDigestRecord` schema.
- `GOWORK=off go build ./...` clean; `go test ./sidecar/...` green
(incl. new memiavl-version-mismatch + writer-panic regression tests);
`gofmt -s` clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@blindchaser blindchaser requested a review from cody-littley June 22, 2026 15:26

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d34312a. Configure here.

Comment thread sei-db/tools/cmd/seidb/operations/evm_logical_digest.go

@cody-littley cody-littley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this code reads all account data into memory. If we have too much account data that risks OOMing the node we're using to compare. But since this tooling is temporary and won't be needed after the migration, I think it's probably ok for now as long as we can fit mainnet account state into memory on the machine where we are doing the testing.

@blindchaser

Copy link
Copy Markdown
Contributor Author

It looks like this code reads all account data into memory. If we have too much account data that risks OOMing the node we're using to compare. But since this tooling is temporary and won't be needed after the migration, I think it's probably ok for now as long as we can fit mainnet account state into memory on the machine where we are doing the testing.

good point. this does not keep all evm state in memory, it does keep the account merged state keyed by unique account address so nonce/codehash can be normalized into the flatkv account buckets. storage/code/legacy are streamed into the digest. this is a good call out and we shuld watch memory usage during the pacific-1 comparision

@blindchaser blindchaser added this pull request to the merge queue Jun 22, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 22, 2026
@blindchaser blindchaser added this pull request to the merge queue Jun 23, 2026
Merged via the queue into main with commit a116a2d Jun 23, 2026
60 checks passed
@blindchaser blindchaser deleted the yiren/flatkv-full-scan branch June 23, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants