Skip to content

feat(protoutils): UnmarshalWithLimit — pre-decode allocation estimate#3615

Open
wen-coding wants to merge 3 commits into
mainfrom
wen/unmarshal_with_limit
Open

feat(protoutils): UnmarshalWithLimit — pre-decode allocation estimate#3615
wen-coding wants to merge 3 commits into
mainfrom
wen/unmarshal_with_limit

Conversation

@wen-coding

@wen-coding wen-coding commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds allocation-bounded protobuf unmarshaling to bound heap allocation during decode.

  • UnmarshalWithLimit[T](bz []byte, limitBytes int) — google v2 proto
  • UnmarshalGogoWithLimit(bz []byte, msg gogoproto.Message, limitBytes int) — gogoproto (Tendermint P2P types)

Both functions run Scan/ScanAny (wireguard schema checks) first, then walk the raw wire bytes with allocEstimate before calling Unmarshal. If the estimated heap allocation exceeds limitBytes, the message is rejected without allocating anything.

How allocEstimate works

The function walks raw wire bytes using protoreflect.MessageDescriptor for the field schema and protoregistry/gogoproto.MessageType for Go struct sizes. It accumulates:

Wire construct What is counted
Message field (repeated or singular) sizeof(GoStruct) from type registry
Repeated message field +pointerSize per element (backing array pointer)
bytes field sliceHeaderSize + len(val)
string field stringHeaderSize + len(val)
Packed repeated varint element count × Go element size (1/4/8 bytes) — accounts for up to 8× wire-to-Go size difference
Map field mapEntryOverhead per entry (runtime map internals) + key/value content
Unknown field (any wire type) full wire record: tagLen + n
Wire-type mismatch counted as unknown bytes (tagLen + n) — no panic
Singular field repeated N times all N occurrences accumulated — accounts for gogoproto merge behavior

The estimate is conservative (may over-count). A 10× estimation error is acceptable — the goal is to distinguish legitimate messages from wire payloads whose decoded size far exceeds their wire size.

Gogoproto bridge

Tendermint P2P types (tmproto.*) are generated by protoc-gen-gogofaster and do not implement google.golang.org/protobuf/proto.Message. UnmarshalGogoWithLimit uses github.com/golang/protobuf/proto.MessageReflect (deprecated but the only bridge) to obtain the protoreflect.MessageDescriptor, and falls back to gogoproto.MessageType for struct size lookups.

Singular field merge behavior

gogoproto merges repeated wire occurrences of a singular field (e.g. Proposal.last_commit) by appending sub-fields. Per-occurrence wireguard checks pass individually, but allocEstimate recurses into every wire occurrence and accumulates the totals, so the full merged allocation is estimated correctly.

Map fields

Map fields are counted at mapEntryOverhead per entry (64 bytes) for runtime overhead plus key/value content from the recursive walk. This covers the hmap struct and bucket array slots conservatively. Currently no wireguard-protected message has map fields; support is included for future messages.

Test plan

  • SmallMessageAccepted — small repeated-message field passes
  • ManyEmptyEntriesRejected — 10k empty repeated entries caught before Unmarshal
  • LargePayloadRejected — large bytes field caught
  • UnknownBytesFieldCounted — unknown bytes field counted toward limit
  • SmallUnknownFieldsAccepted — small unknown scalars pass cleanly
  • ResultIsCorrect — decoded result is correct when within limit
  • PackedVarintAmplificationCounted — 200k packed uint64 (1 byte/wire, 8 bytes/Go) rejected
  • WireTypeMismatchNoPanic — mismatched wire type: no panic, counted as unknown
  • TruncatedInputReturnsError — truncated mid-field returns error, not panic
  • ZeroLimitPanicslimitBytes <= 0 panics (programming error)
  • LargeMapRejected — 50k map entries exceed 1MB limit
  • SmallMapAccepted — 2-entry map passes
  • GogoManyEmptySignaturesRejected — gogoproto Commit with 10k signatures caught
  • GogoSmallCommitAccepted — legitimate gogoproto Commit passes
  • GogoSingularFieldMerge — 500 × last_commit each with 50 signatures exceeds 1MB

Load tests (alloc_scan_load_test.go):

  • TestUnmarshalWithLimit_MaxBlockAccepted — max-sized autobahn Block (2000 txs × 1024 bytes) accepted at 4MB limit
  • BenchmarkUnmarshalWithLimit_MaxBlock — ~242µs for 2MB block, 8.5 GB/s
  • BenchmarkUnmarshalWithLimit_AmplifiedPayload — ~316ns for 20KB payload rejected before Unmarshal

@cursor

cursor Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches untrusted P2P/block decode paths and uses conservative estimates that could reject edge-case legitimate messages if limits are set too tight; logic is complex but well-tested.

Overview
Adds allocation-bounded protobuf unmarshaling so small wire payloads cannot trigger huge heap use during decode. New UnmarshalWithLimit (google v2) and UnmarshalGogoWithLimit (gogoproto / Tendermint P2P) run existing Scan / ScanAny, then allocEstimate on raw wire bytes, and only call Unmarshal if the conservative heap estimate is within limitBytes.

allocEstimate walks tags with protowire, using descriptors and Go struct sizes from protoregistry / gogoproto.MessageType. It counts message structs, repeated pointers, bytes/strings, packed varint amplification, map entry overhead, unknown fields, wire-type mismatches (as unknown bytes), and all occurrences of singular fields (for gogoproto merge). Corrupt or truncated wire returns an error before decode.

Extensive unit, gogo, and load/benchmark tests cover amplification rejection, legitimate max blocks, and fast rejection paths.

Reviewed by Cursor Bugbot for commit 15f42cb. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 22, 2026, 9:39 PM

@github-actions

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 21, 2026, 5:38 PM

Comment thread sei-tendermint/internal/protoutils/msg.go
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 46.59091% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.12%. Comparing base (f83a111) to head (15f42cb).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/protoutils/alloc_scan.go 42.85% 75 Missing and 9 partials ⚠️
sei-tendermint/internal/protoutils/msg.go 65.51% 5 Missing and 5 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3615      +/-   ##
==========================================
- Coverage   59.01%   58.12%   -0.90%     
==========================================
  Files        2224     2151      -73     
  Lines      182814   174443    -8371     
==========================================
- Hits       107893   101399    -6494     
+ Misses      65220    64029    -1191     
+ Partials     9701     9015     -686     
Flag Coverage Δ
sei-chain-pr 59.48% <46.59%> (?)
sei-db 70.41% <ø> (-0.22%) ⬇️
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/protoutils/msg.go 76.59% <65.51%> (-17.85%) ⬇️
sei-tendermint/internal/protoutils/alloc_scan.go 42.85% <42.85%> (ø)

... and 75 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread sei-tendermint/internal/protoutils/msg.go
Comment thread sei-tendermint/internal/protoutils/msg.go
Comment thread sei-tendermint/internal/protoutils/alloc_scan.go Outdated
}

const (
pointerSize = 8 // pointer on 64-bit

@pompon0 pompon0 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflect.TypeFor[...]().Size() ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case protoreflect.Int32Kind, protoreflect.Uint32Kind,
protoreflect.Sint32Kind, protoreflect.EnumKind:
return countVarintsInPacked(bz) * 4
default: // int64, uint64, sint64

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it explicit case, panic on default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for len(bz) > 0 {
_, n := protowire.ConsumeVarint(bz)
if n < 0 {
break // malformed; allocEstimate will catch the outer error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by "will catch the outer error"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

case protowire.VarintType:
_, n := protowire.ConsumeVarint(data)
if n < 0 {
return 0, fmt.Errorf("alloc scan: malformed varint field %d: %w", num, protowire.ParseError(n))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this shared error message prefix looks like it can be prepended at the call site.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// the unknown fields blob, allocating exactly tagLen+n bytes.
total += tagLen + n
} else if fd.IsList() {
total += sliceHeaderSize + 8 // over-counts slice header per element, noise at 1MB scale

@pompon0 pompon0 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this can cause up to 32x higher estimate than the real thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, scalarElementSize now returns 1 for bool, 4 for 32-bit kinds, 8 for 64-bit kinds. The 32× overcount for bool (1 byte wire, was returning 32) is gone.

@pompon0

pompon0 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Note that this impl is meaningfully slower than the current wireguard approach because of the use of descriptors (note that proto.Unmarshal does not use protoreflect on the hot path). Probably still acceptable overhead though, just saying.

Comment thread sei-tendermint/internal/protoutils/alloc_scan.go
Comment thread sei-tendermint/internal/protoutils/alloc_scan.go
for len(bz) > 0 {
_, n := protowire.ConsumeVarint(bz)
if n < 0 {
break // partial count on malformed input; proto.Unmarshal will reject it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return an error here, rather than a partial count?

@wen-coding wen-coding Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, returning an error now

Comment thread sei-tendermint/internal/protoutils/alloc_scan.go Outdated
Comment thread sei-tendermint/internal/protoutils/msg.go

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4aec24b. Configure here.

Comment thread sei-tendermint/internal/protoutils/alloc_scan.go Outdated
Comment thread sei-tendermint/internal/protoutils/alloc_scan.go Outdated
Adds allocation-bounded protobuf unmarshaling. Before calling Unmarshal,
walks raw wire bytes with allocEstimate to bound the heap allocation the
decode would make. Returns an error if the estimate exceeds limitBytes,
without allocating anything.

Public API:
  UnmarshalWithLimit[T](bz []byte, limitBytes int) — google v2 proto
  UnmarshalGogoWithLimit(bz []byte, msg, limitBytes int) — gogoproto

allocEstimate accounts for:
- sizeof(GoStruct) per message occurrence (protoregistry / gogoproto)
- backing arrays for bytes/string fields
- packed varint amplification (up to 8× wire-to-Go size difference)
- map field runtime overhead (hmap + bucket slots)
- unknown fields and wire-type mismatches (full wire record: tagLen+n)
- all wire occurrences of singular fields (gogoproto merge behavior)
- truncated or corrupt input returns an error immediately

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@wen-coding wen-coding force-pushed the wen/unmarshal_with_limit branch from 60a08ff to a5c8a03 Compare June 22, 2026 21:30
wen-coding and others added 2 commits June 22, 2026 14:34
A zero or negative limit is a programming error. Panic with a descriptive
message rather than silently rejecting all messages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Map fields satisfy IsList() (maps encode as repeated entries), so the
previous ordering added a spurious sliceHeaderSize per map occurrence.
A Go map is not a slice — check IsMap first and skip the slice header.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants