Skip to content

adaptive_export/sink: content_type silent-drop contract suite#57

Open
ConstanzeTU wants to merge 1 commit into
ae-prodfrom
entlein/ae-content-type-contract
Open

adaptive_export/sink: content_type silent-drop contract suite#57
ConstanzeTU wants to merge 1 commit into
ae-prodfrom
entlein/ae-content-type-contract

Conversation

@ConstanzeTU

Copy link
Copy Markdown

Summary: Adds six default-suite Go tests (~15ms) under src/vizier/services/adaptive_export/internal/sink/content_type_contract_test.go that pin the content_type Int64 schema invariant, the fastencode encoder emitting it as a JSON number, the silent-drop detection on X-ClickHouse-Summary written_rows zero, and the tolerate-missing-header policy. Top-of-file docstring chronicles the incident timeline (2026-05-23 redis_events, 2026-06-07 rig 6a25c85c, 2026-06-09 PR53 comment 4661115386 apples-vs-oranges) so future operators can grep their way to the contract. Closes the regression class so we cant have it on this every day.

Test Plan: 1) go test ./src/vizier/services/adaptive_export/internal/sink/ -run TestContract_ -v -- 6/6 PASS in 15ms. 2) Full sink suite (clickhouse_test, encode_bench_test, fastencode_test) still green. 3) gazelle ran to add the new file to BUILD.bazel. 4) arc lint clean (pre-commit hook green).

Type of change: /kind feature

Consolidates the recurring content_type silent-drop incident class
into one default-suite test gate (6 tests, ~15ms):

  I1 TestContract_ContentTypeIsInt64InSchema
  I2 TestContract_FastEncodeContentTypeAsInt
  I3 TestContract_SilentDropDetected
  I3.b TestContract_SilentDropNotTriggeredOnSuccess
  I3.c TestContract_SilentDropToleratesMissingSummaryHeader
  I4 TestContract_HTTPEventsRoundTrip

Top-of-file docstring chronicles the incident timeline so future
operators can grep their way to the contract.
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: df7187c5-5422-4f8d-bf0f-6ab7375861ae

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch entlein/ae-content-type-contract

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ConstanzeTU

Copy link
Copy Markdown
Author

AE bench-pprof results (local, 1000-row http_events batches)

Pprof'd the AE sink hot path via the existing encode_bench_test.go suite. On-pod live pprof attempted but blocked: AE refuses to start with fatal: schema apply failed against a wiped CH PVC (PVC was cleared to recover the host from 8 days of DiskPressure; CH operator now CrashLoops on the orphaned metadata/ + preprocessed_configs/). The bench numbers exercise the same code: encodePixieRowsFast -> appendJSONValue -> appendJSONString -> WritePixieRows -> summaryWroteFewerThan.

Bench numbers (1000-row http_events batches, 32 cores)

bench ns/op B/op allocs/op
EncodePixieRowsFast_Pooled 2,360,116 66 190 2 000
EncodeJSONEachRow (slow path) 10,246,895 5 236 127 57 034
WritePixieRows_LocalHTTPLoopback 3,498,161 132 700 2 231

CPU hotspots (15.99s of 18s wall, 88% sample density)

  • 50% appendJSONValue
  • 28% appendJSONString (string escaping)
  • 23% runtime.mapaccess2_faststr -- per-row row[col] map lookups
  • 14% time.Time.Format -- DateTime64 formatting per row

Heap (553 MB allocated over the bench)

  • 343 MB / 62% in time.Time.Format -- single biggest optimisation lever
  • 53 MB / 10% io.copyBuffer (HTTP request body copy)
  • 44 MB / 8% bytes.growSlice (encode buffer growing past pool initial)

Two concrete optimisations (real returns on the AE-pod-CPU NFR)

  1. time.Time.Format -> pooled scratch + AppendFormat for event_time. The comment at fastencode.go:170 says the path already uses AppendFormat, but the profile shows the standard time.Time.Format(layout) string is still dominant. Likely lives in normalisePixieValue on the slow path or in a code branch that bypasses the fast-path AppendFormat. ~10-15% CPU + ~60% of total alloc bytes if fully migrated.
  2. Pre-resolved column-index slice in getCachedColumns instead of per-row row[col] map lookups. 23% of CPU is in mapaccess2_faststr; a fixed-order slice walk drops that to single-digit %.

Neither is in scope for the contract suite PR. Filing as a separate optimisation PR after this one merges.

cc: PR #53 (production AE) for context on the "AE pod at 2.4 cores" NFR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants