fix(snappy): reject >9-byte uvarint continuation runs (no integer-overflow panic) by zclawz · Pull Request #9 · blockblaz/zig-snappy

zclawz · 2026-05-09T09:05:54Z

Bug

uvarint() in snappy.zig panics with integer overflow when the input contains 10+ continuation bytes (b >= 0x80).

Reproduction in zeam Hive (test gossip: ignores malformed ssz — 27 zeam_devnet4 cascade failures from this single panic):

thread 64 panic: integer overflow
snappy.zig:73:11 in uvarint     s += 7;
snappy.zig:105:27 in decodedLen const varint = uvarint(src);
snappy.zig:265:33 in decodeWithMax

The simulator publishes 1024 bytes of 0xef on a valid gossip topic. Each 0xef has the high bit set, so uvarint keeps looping. The shift counter s: u6 reaches 63 after 9 iterations, and s += 7 on the 10th iteration overflows the u6, panicking the network thread and crashing the client.

Fix

Reject as malformed (return the corrupt sentinel bytesRead == 0) as soon as the loop sees its 10th continuation byte. A 64-bit varint is at most 10 bytes long, so further bytes can't possibly be valid — and we must bail before the next s += 7.

Also tightened decodedLen()'s corrupt check from <= 0 to == 0. The old check was a no-op for any sentinel: bytesRead is usize and the previous "corrupt" path returned -%i + 1 which wraps to a huge usize, never <= 0.

Edge cases covered (new tests)

1024 bytes of 0xef (the actual Hive payload) → Corrupt, no panic
Exactly 10 continuation bytes, no terminator → corrupt
11 continuation bytes followed by a valid terminator → corrupt
10th byte with value > 1 (overflows u64) → corrupt (existing rule, still enforced)
10th byte with value == 1 (max u64 = 0xffff_ffff_ffff_ffff) → 10 bytes consumed, valid
Empty buffer → bytesRead == 0, no read
decode and decodeWithMax on the 1024×0xef payload → error.Corrupt

Verification

$ zig build test --summary all
Build Summary: 3/3 steps succeeded; 9/9 tests passed

The regression test decode rejects garbage with long continuation run reproduces the exact zeam Hive scenario.

Companion change

A follow-up PR on blockblaz/zeam will bump this dep so the live Hive failure clears. cc @ch4r10t33r

A malformed snappy header that consists of 10 or more continuation bytes (byte >= 0x80) walks the loop in uvarint() past the u6 shift counter. After 9 iterations s == 63, and on the 10th iteration s += 7 panics with `integer overflow` in safe builds. Triggered in zeam by the Hive `gossip: ignores malformed ssz` test (1024 bytes of 0xef on a valid gossip topic). Reject malformed varints as soon as we see continuation byte 10 by returning the corrupt sentinel (`bytesRead == 0`) instead of pushing the shift counter further. Also tighten decodedLen() to compare against 0 explicitly (the previous `<= 0` on a usize was a no-op for the wrap- around sentinel that the old code tried to use).

@min

…and tests Changes from review on PR #9: uvarint: - Restructure loop to iterate buf[0..@min(len,10)] so the 10-byte budget is encoded in the loop bound itself; `s += 7` can no longer push s:u6 past 63 by construction. - Replace `i >= 9` continuation-bit-still-set guard with `i == 9` (loop now cannot reach i == 10). - Move struct-level rationale onto Varint with explicit doc that `bytesRead == 0` is overloaded across truncated and malformed cases, and what to do if uvarint is ever made pub (return ?Varint or isize byte count, matching Go's encoding/binary.Uvarint convention). - Strip the gossip-payload reference from the library function comment; downstream context lives in the test comment. - Multi-line all early-return Varint{} literals to match the existing multi-line style. decodedLen: - Revert the `<= 0` -> `== 0` clarification. For usize the two are identical and the original PR description framed it as a tightening, which is misleading. Tests: - Symmetric `expectEqual(@as(u64, 0), vN.value)` on every malformed sub-case for parity with the bytesRead asserts. - New: 9 continuation bytes with no 10th byte (truncated input falls through to the post-loop return, distinct from the i == 9 guard). - New: non-canonical `[0x80, 0x00]` encoding of zero. Pins current behaviour (varint format does not require minimal encoding) so a future canonicalisation tightening is a deliberate breaking change.

zclawz · 2026-05-09T10:04:51Z

Thanks for the careful review. Pushed a01e503 addressing 1, 2, 3, 4, 5, 6, 7, 9, 10. Filing 8 as a follow-up issue.

1. Restructured to for (buf[0..@min(buf.len, 10)], 0..) so the 10-byte budget lives in the loop bound. Loop can't reach i == 10 by construction; s += 7 can't push s:u6 past 63 by construction. Both internal guards are now i == 9 (the in-loop continuation-bit-still-set case stays as a guard because we still need to bail before the next iteration's s += 7 runs).

2. Added a doc block on Varint explicitly calling out that bytesRead == 0 is overloaded across truncated and malformed inputs, that the Go reference distinguishes via negative byte count, and what to do if uvarint is ever made pub (switch to ?Varint or isize).

3. Comment rewritten: "10th byte (i == 9) with continuation bit still set: varint exceeds the 64-bit budget. Bail before the next s += 7 would overflow the u6 shift counter."

4. Reverted. Original <= 0 is preserved; for usize it's identical to == 0 and the previous "tightened" framing was misleading.

5. All early-return Varint{} literals now multi-line, matching the existing style.

6. Every malformed sub-case now asserts both bytesRead == 0 and value == 0.

7. "malformed gossip payload" line is gone from the library; the regression detail stays in the test comment where it belongs.

8. Out of scope, will file a follow-up issue: a 1k-iteration random-bytes fuzz against decode/decodeWithMax would catch sibling panics in runDecode and other malformed-input crashes. Linked here once filed.

9. Added: [0xff] * 9 exactly. This is the truncated-input codepath that falls through the post-loop return, which is distinct from the i == 9 guard. Comment in the test calls out the distinction.

10. Added: [0x80, 0x00] non-canonical encoding of zero. Pins current behaviour (the varint format doesn't require minimal encoding) so any future tightening is a deliberate breaking change rather than a silent drift.

Final diff: +88 / -42 in snappy.zig. zig build test 9/9 green. The 1024×0xef repro still returns error.Corrupt instead of panicking.

zclawz · 2026-05-09T10:05:35Z

Re: point 8 (fuzz follow-up) — I tried to file it as an issue here but issues are disabled on this repo. Sketch below for the next time we touch this file or open a tracker repo:

test "fuzz: decode never panics on random input" {
    const allocator = testing.allocator;
    var prng = std.Random.DefaultPrng.init(0xDEAD_BEEF_CAFE_F00D);
    const random = prng.random();

    var buf: [4096]u8 = undefined;
    var i: usize = 0;
    while (i < 1024) : (i += 1) {
        const len = random.intRangeAtMost(usize, 0, buf.len);
        random.bytes(buf[0..len]);
        if (decode(allocator, buf[0..len])) |out| { allocator.free(out); } else |_| {}
        if (decodeWithMax(allocator, buf[0..len], 16 * 1024 * 1024)) |out| { allocator.free(out); } else |_| {}
    }
}

Goal: catch sibling panics in runDecode (tag-literal length boundaries, tagCopy* offset casts, the dst[d - offset ..] slicing, etc.) the same way #9 caught the one in uvarint. Deterministic seed so CI failures reproduce locally; both decode and decodeWithMax exercised so we cover the early-bail path and the full-decode path. Out of scope for #9; happy to open a separate PR for it after this one merges.

build.zig.zon was still pinned at 0.0.3 even though tag v0.0.4 was cut on the decodeWithMax commit. Bump to 0.0.5 to reflect the uvarint integer-overflow fix landing on master via #9. Skipping 0.0.4 because that tag already exists pointing at e95e113 and re-tagging would break consumers that resolved against the existing tag.

@min

Move from the pre-review commit on the fix branch (9521bbcb) to the tagged release v0.0.5 (050f529b) which includes the merged review feedback: - 10-byte budget is now encoded directly in the uvarint loop bound (`buf[0..@min(buf.len, 10)]`) so `s += 7` cannot overflow s:u6 by construction. - Varint sentinel semantics are documented on the type itself. - Test coverage extended: 9-byte truncation, non-canonical `[0x80, 0x00]` zero encoding, symmetric value/bytesRead asserts on every malformed case. Network tests remain green (14/14, including the validateSnappyBlockHeader and snappyz.decodeWithMax regression tests added in this PR). Refs: - https://github.com/blockblaz/zig-snappy/releases/tag/v0.0.5 - blockblaz/zig-snappy#9

@min

…ve `gossip: ignores malformed ssz`) (#855) * network: stop integer-overflow panic on malformed gossip SSZ (Hive #390) Hive `gossip: ignores malformed ssz` (test 390) publishes 1024 bytes of `0xef` on a valid block topic and expects the client to remain healthy. zeam_devnet4 instead panicked on the network thread: thread 64 panic: integer overflow handleMsgFromRustBridge -> snappyz.decodeWithMax -> uvarint s += 7; // s: u6, after 9 iterations s == 63 Every `0xef` is a varint continuation byte; after the 10th continuation byte `s += 7` overflows the u6 shift counter. The crash took down the client and cascaded into 26 follow-up zeam_devnet4 failures (every test that re-used the second node afterwards reported "client should connect and send a request: Elapsed(())"). Two layers of defense: 1. Bump zig-snappy to a fix that rejects >9-byte continuation runs in uvarint() instead of panicking (blockblaz/zig-snappy#9). The decoder now returns error.Corrupt for the exact Hive payload. 2. Add validateSnappyBlockHeader() and call it in handleMsgFromRustBridge before invoking the third-party decoder. Uses zeam's own uvarint (multiformats) which has proper bounds checking and per-topic size limits. This guards against any future regression in the upstream decoder and rejects oversized declared sizes before any heap allocation. Edge cases covered by new tests in pkgs/network/src/ethlibp2p.zig: - 1024 bytes of 0xef (the actual Hive payload) -> rejected, no panic - 11 continuation bytes followed by a valid terminator -> rejected - Empty payload -> rejected - Declared size > MAX_GOSSIP_BLOCK_SIZE -> rejected - Header-only buffer with non-zero declared size -> rejected - Well-formed header + payload byte -> accepted - Zero-length declared block (single 0x00 byte) -> accepted - snappyz.decodeWithMax on 1024 bytes of 0xef -> error.Corrupt (no panic) Refs: https://hive.leanroadmap.org/suite.html?suiteid=1778305924-e14785654316449971f512c113089da3.json&suitename=gossip&client=zeam_devnet4#test-390 Companion: blockblaz/zig-snappy#9 * deps: pin zig-snappy to tagged release v0.0.5 Move from the pre-review commit on the fix branch (9521bbcb) to the tagged release v0.0.5 (050f529b) which includes the merged review feedback: - 10-byte budget is now encoded directly in the uvarint loop bound (`buf[0..@min(buf.len, 10)]`) so `s += 7` cannot overflow s:u6 by construction. - Varint sentinel semantics are documented on the type itself. - Test coverage extended: 9-byte truncation, non-canonical `[0x80, 0x00]` zero encoding, symmetric value/bytesRead asserts on every malformed case. Network tests remain green (14/14, including the validateSnappyBlockHeader and snappyz.decodeWithMax regression tests added in this PR). Refs: - https://github.com/blockblaz/zig-snappy/releases/tag/v0.0.5 - blockblaz/zig-snappy#9 * network: address PR #855 review feedback Substantive - 1. Naming inversion fixed. The pre-PR `validateGossipSnappyHeader` was used by RPC frame parsers (parseRequestFrame/parseResponseFrame), not by gossip. Renamed it to `validateRpcSnappyHeader` and gave the new gossip guard the (now correct) name `validateGossipSnappyHeader`. Both sit on top of a shared `validateSnappyHeader` helper that takes max_size as a parameter so both call sites use the same code path. - 2. Bool return replaced with a typed error union. Three different attacker shapes (corrupt varint, oversized claim, missing body) now surface as three distinct `SnappyHeaderValidationError` variants so ops/metrics can tell them apart instead of getting a single "malformed snappy header" line. - 3. Sender peer_id now appears in every gossip rejection log. The previous log only printed topic and length even though sender_peer_id was a fn arg \u2014 attribution is back. - 4. Peer scoring follow-up flagged with TODO at the gossip rejection site. Out of scope for the panic fix; tracked for a future PR. - 5. `writeFailedBytes` is now sample-rate limited via a process-local `shouldPersistMalformedDump()` (1-of-1024 + always-the-first). A peer spamming garbage gossip can no longer fill the disk with one debug file per message. - 6. Two-layer-defense exit criteria documented in the validator's doc comment. Three reasons the local guard stays permanent: per-topic size pre-allocation gating, typed errors for ops, and a safety net against future upstream regressions. - 7. Header-only-with-non-zero-body is no longer mislabeled as "malformed header". The header is well-formed; the body is missing. Distinct `HeaderWithoutBody` variant + distinct log reason + distinct dump label "snappy_truncated". RPC frame parser maps it to `error.Incomplete` (body bytes may not have arrived yet on this read), not to a fatal frame error. - 8. Varint-decoded-twice contract pinned with comments at both the validator and the gossip call site. Notes that both decoders MUST agree on strict `>` for the size-limit comparison. - 9. TODO at the per-kind size switch noting that attestations/ aggregations rarely approach MAX_RPC_MESSAGE_SIZE and tighter limits belong here. Nits - 10. Validator doc comment is generalised. No longer hardcodes the 10-byte continuation anecdote in the library function; the regression detail lives in the test. - 11. The decodeWithMax test is now explicitly labelled "REGRESSION CANARY" with an inline note that an unexpected panic here means the upstream zig-snappy pin has been downgraded below v0.0.5. Fixes the cryptic-failure-mode complaint. - 12. New test pinning the body-shorter-than-declared case as accepted by the header validator (decoder is authoritative for body integrity). Doc comment on `validateSnappyHeader` calls this out explicitly. - 13. Boundary test added: declared == max_size accepted, declared == max_size + 1 rejected. Pins the strict-`>` contract against upstream's `if (block.blockLen > max_size)`. The 1-byte-mismatch scenario the reviewer was worried about will surface here loudly instead of silently. - 14. TODO marker added at MAX_GOSSIP_BLOCK_SIZE noting the per-peer memory-pressure surface is real; revisit once the leanSpec lands. --------- Co-authored-by: zclawz <zclawz@users.noreply.github.com>

zclawz mentioned this pull request May 9, 2026

fix(network): stop integer-overflow panic on malformed gossip SSZ (Hive gossip: ignores malformed ssz) blockblaz/zeam#855

Merged

ch4r10t33r requested review from anshalshukla and g11tech May 9, 2026 09:53

ch4r10t33r approved these changes May 9, 2026

View reviewed changes

ch4r10t33r merged commit 8321cfa into master May 9, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snappy): reject >9-byte uvarint continuation runs (no integer-overflow panic)#9

fix(snappy): reject >9-byte uvarint continuation runs (no integer-overflow panic)#9
ch4r10t33r merged 2 commits into
masterfrom
fix/uvarint-overflow-on-long-continuation-runs

zclawz commented May 9, 2026

Uh oh!

zclawz commented May 9, 2026

Uh oh!

zclawz commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zclawz commented May 9, 2026

Bug

Fix

Edge cases covered (new tests)

Verification

Companion change

Uh oh!

zclawz commented May 9, 2026

Uh oh!

zclawz commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants