Implement native rules#162
Conversation
Mirrors protovalidate-go's validator_bench_test.go in a new private packages/protovalidate-bench workspace so runtime cost can be tracked across changes and compared cross-language. Uses tinybench, hand-built deterministic fixtures, and writes JSON results to .tmp/bench/. Adds a checkbench script to diff two runs with a noise-aware regression threshold and non-zero exit on regression, suitable for gating PRs.
Phase 0 of porting protovalidate-go's native rule evaluation. This change
introduces the surface and plumbing without yet replacing any CEL rule.
- Adds `disableNativeRules?: boolean` to `ValidatorOptions` (default false)
- Threads the resolved `regexMatch` and the new flag from `createValidator`
through `Planner` so future phases have a single source of truth for both
- Scaffolds `src/native/`: dispatcher stub returning `{kind:"none"}`, plus
`codepointLength` and `printFloat` helpers used by upcoming handlers
- Wires `Planner.rules()` to consult the dispatcher and skip CEL enrollment
for any field it claims, appending the native eval to the resulting
`EvalMany`. With the stub this is behavior-preserving.
Unit tests cover the new format helpers and assert that the native-on and
native-off paths produce identical violations. Conformance stays green
(2870 passed, 2 expected skips). Benchmark deltas vs `jbodner/add-benchmarks`
sit within the 5% threshold across all 17 suites.
Eleven near-identical .bench.ts files have collapsed to four: cases.ts lists every (name, schema, fixture) triple in one place, validate.bench.ts iterates it for the per-case validate-time benches, and compile.bench.ts plus standard-schema.bench.ts look up curated subsets by name. Adding a benchmark is now a one-row append to cases.ts plus a fixture in fixtures.ts instead of new-file + import + register call in bench.ts. Bench output is byte-identical: same 17 tasks, same names, same ordering, deltas within the noise floor.
Pulls in the bench suite consolidation (cases.ts registry + collapsed validate/compile/standard-schema suites) so Phase 1+ can add native rule benchmarks as one-row appends to cases.ts.
Phase 1 of the protovalidate-go native rules port. Every standard rule on bool, int32/int64/uint32/uint64/sint32/sint64/fixed32/fixed64/sfixed32/ sfixed64/float/double — const, gt, gte, lt, lte, in, not_in, and finite for float/double — now runs through a hand-written TS evaluator instead of CEL. A WrappedValueEval adapter handles the matching `google.protobuf.*Value` wrapper messages so wrapper fields validated against scalar rules go through the same native path. The dispatcher in planner.ts skips CEL enrollment for fields the native path claims and falls through to CEL for fields it doesn't (unknown extensions, NaN-bound rules, anything not yet ported). Rule paths, rule IDs (including compound ones like int32.gt_lt_exclusive), and violation messages are byte-identical to the CEL output. Conformance: 2870 pass / 2 expected skips (matches baseline). Unit tests: 857 pass — 44 new diff-based tests assert native and CEL violation arrays match for every rule on every scalar type plus every wrapper type. Benchmark deltas vs phase 1 baseline (mean latency): Scalar -85.7% Int32GT -90.0% MultiRule/NoError -88.4% MultiRule/Error -58.5% Repeated/Message -85.0% WrapperTesting -78.7% ComplexSchema -63.9% StandardSchema/Scalar -85.3% StandardSchema/ComplexSchema -62.4% String/bytes/map suites unchanged within noise; 0 regressions past 5%. Updates the protovalidate test script to walk src/**/*.test.ts so the new suites under src/native/ are picked up by `npm test`.
Cleanups surfaced by the post-merge review:
- Collapse the 12 thin tryBuildNativeXxxRules wrappers into one
tryBuildNativeNumericRules that switches on rules.$typeName. Per-type
configs are no longer exported.
- Refactor EvalNativeNumericRules to hold each rule as a narrowed object
({val, path} or {kind, val, path}) instead of separate const/in/lo/hi
fields plus an optional-everywhere paths bag. Removes the `0 as T`
default and every biome-ignore non-null assertion. Range/list helpers
move to module scope and take the narrowed rule object directly.
- Drop the asReflectGet helper; inline the cast at its two call sites
with a one-line comment.
- Remove the unused regexMatch field from NativeDispatchInput (and from
Planner — it was being threaded through but no consumer was reading
it). String handlers will re-add it in phase 4.
- Throw a CompilationError if a wrapper descriptor lacks a "value" field
instead of silently falling back to a non-wrapper code path.
- Update printFloat's docstring to describe what it actually does
(matches CEL-TS Number.toString) instead of the inaccurate "mirrors
protovalidate-go". Note the eventual fix is in cel-es + here together.
- Add the 5 test gaps the review flagged: NaN value with float.in list,
const + range emitting both violations, unset BoolValue wrapper, int64
max-boundary const, and explicit float.finite=false (no-op rule).
Verified:
- 862 unit tests pass (+5 gap tests).
- Conformance: 2870 pass / 2 expected skips, unchanged.
- Bench deltas vs phase1-baseline: 10 improvements (matching phase 1's
wins), 0 regressions past 5%.
- Lint, attw, build green.
Net: 142 lines removed across implementation, 56 lines of new test code.
The "partial" arm of the {kind:"none"|"partial"|"full", ...} union was never
produced — every successful native handler in phase 1 returns full handling,
and "full"-vs-"partial" pruning is already handled implicitly by
EvalMany.prune() dropping any empty EvalStandardRulesCel.
Replace the union with `NativeDispatchResult | undefined` / `ScalarNativeResult
| undefined` so the producer signals "no match" by returning undefined and the
consumer narrows with `native === undefined` / `native?.handledFields.has(…)`.
Removes ~10 lines of boilerplate and a layer of TS narrowing.
No behavior change. 862 unit tests pass, conformance 2870/2 expected skips/0
fail unchanged, lint/attw/build green.
Phase 2 of the protovalidate-go native rules port. - enum: const, in, not_in. defined_only keeps its existing dedicated evaluator (EvalEnumDefinedOnly) — defined_only and the native subset coexist on the same field. - repeated (list-level): min_items, max_items, unique. unique is handled natively for scalar / enum / bytes element kinds; message-element lists fall through to CEL for unique while min/max_items still run natively. - map: min_pairs, max_pairs. The dispatcher grows a switch on rules.$typeName so enum/repeated/map route to the right per-type builder. RepeatedRules dispatch receives the list field descriptor so the unique builder can pick the right comparator (strict-equal Set for number/bigint/string/bool/enum; bytes-string-key Set for Uint8Array). The planner threads the field through. Verified: - 883 unit tests pass (+21 new diff-based tests covering each rule type, the unique-by-element-kind matrix, the message-element fallthrough, and a path-shape assertion). - Conformance: 2870 pass / 2 expected skips (unchanged). - Lint, attw, build green. Benchmark deltas vs phase2-baseline.json (mean latency, two runs): Map -62% Repeated/Scalar -67% Repeated/Unique/Bytes -45% Repeated/Unique/Scalar -46% Repeated/Message -29% ComplexSchema -40% StandardSchema/Complex -31% 0 regressions past 5% threshold across two runs. String/bytes/wrapper suites stay within noise (their rules are still on CEL until phases 3-5).
Cleanups surfaced by the post-merge review:
- Move formatList to format.ts as a generic helper taking an element
formatter. numeric.ts and enum.ts now share it.
- Refactor EvalNativeRepeatedRules and EvalNativeMapRules to hold each
rule as a narrowed {val, path} (or {kind, val, path}) record, matching
numeric.ts's phase-1 cleanup. Drops every biome-ignore non-null
assertion in repeated.ts and map.ts.
- Use isFieldSet(unique) as the gate in repeated.ts so explicit
`unique: false` claims the field (no-op rule) for parity with
numeric.ts's `finite` treatment.
- Add an in-file comment to repeated.ts explaining the deliberate
divergence from protovalidate-go: when unique:true is set on a
message-element list, the TS port keeps min/max_items native and only
releases unique to CEL, instead of bailing the entire RepeatedRules
handler. Conformance holds in both shapes.
- Drop the misleading `as number` cast on Uint8Array indexing.
- Reword the forMapKey guard comment in repeated.ts to call it a
type-level invariant tripwire (the planner never dispatches
RepeatedRules with forMapKey=true).
- Mark ListNativeResult and MapNativeResult as @internal.
- Unify enum.ts guard style with numeric.ts by introducing a small
`contains` helper so biome doesn't push toward `?.` chaining for one
case and leave `!== undefined` for the other.
Tests:
- Pin pathToString output exactly ("repeated.min_items" etc.) instead
of substring matching either casing.
- Add path-shape assertions for repeated.max_items, repeated.unique,
map.min_pairs, map.max_pairs.
- Cover Uint8Array elements in repeated.unique tests: empty buffers
(single + duplicate-empty).
- Add empty-list test for standalone repeated.max_items.
- Add a third assertion to the message-element fallthrough test that
exercises the CEL-handled unique violation path.
- Add an enum.in test for the proto3 default-zero scenario.
- Add a combined min_items + max_items + unique test.
- Add a repeated.unique = false test confirming the no-op semantics.
Verified:
- 891 unit tests pass (+8 from 883).
- Conformance: 2870 / 2 expected skips / 0 fail (unchanged).
- Lint, attw, build green.
- Bench: 0 regressions vs phase2-baseline.json across 17 tasks;
11 improvements (phase 2's wins intact).
The 5 native-rule test files each carried a verbatim copy of the same 30
lines of boilerplate: `bufCompileOptions`, the `native` and `cel`
validators, the `diff(schema, msg)` helper, and a near-identical
`compile(proto)` helper. Repeated.test.ts even diverged into compileFile
to support multi-message schemas, drifting from the others.
Consolidate into `src/native/testing.ts`:
- `bufCompileOptions`, `native`, `cel` exported once.
- `diff(schema, msg)` validates with both paths and asserts violation
arrays are byte-identical.
- `compile(definition, { preamble? })` always uses compileFile and
expects `message M { ... }` as the validation target. Helper messages
/ enums go in `preamble`. Wrappers are always imported (harmless when
unused) so callers no longer need to manage that.
Test files now declare only their per-suite preambles:
- bool.test.ts: no preamble
- numeric.test.ts: no preamble
- map.test.ts: no preamble
- enum.test.ts: `enum Color { … }`
- repeated.test.ts: `enum Color { … } message Inner { … }`
Net diff: -169 lines across the 5 test files, +88 lines for testing.ts.
891 unit tests pass unchanged, conformance 2870/2 skipped/0 fail
unchanged, lint/attw/build green.
Phase 3 of the protovalidate-go native rules port.
Bytes rules now run through a hand-written TS evaluator instead of CEL:
const, len, min_len, max_len, prefix, suffix, contains, in, not_in,
pattern, plus the well-known formats ip/ipv4/ipv6/uuid. The well-known
formats validate byte-slice lengths (4 and/or 16) the same way Go does
and emit the bytes.{ip,ipv4,ipv6,uuid}[_empty] rule IDs that match CEL's
predefined annotations byte-for-byte.
bytes.pattern requires valid UTF-8: invalid input surfaces as a
RuntimeError, matching CEL's `string(bytes)` cast which uses
`new TextDecoder("utf-8", { fatal: true })`. The pattern engine is
honored from the validator's regexMatch option when supplied; the
default uses the ECMAScript `RegExp` engine, the same fallback CEL uses
today. Phase 4 will swap the default to the cel-es `re2` package.
regexMatch is re-threaded through Planner → tryBuildNative; phase 1's
cleanup removed it because no handler used it. Bytes is the first
handler that needs it. validator.ts captures opt?.regexMatch into a
local and passes it to both CelManager and Planner so they share a
single matcher.
BytesValue (WKT) wraps the native scalar evaluator via the existing
WrappedValueEval adapter — no new plumbing needed.
Verified:
- 915 unit tests pass (+24 new bytes tests covering every rule, well-
known formats with valid/empty/wrong-size paths, UTF-8 RuntimeError,
regexMatch override, BytesValue wrapper, and four rule-path-shape
assertions).
- Conformance: 2870 pass / 2 expected skips / 0 fail — unchanged.
- Lint, attw, build green.
Benchmark deltas vs phase3-baseline.json (mean latency):
TestByteMatching -84%
WrapperTesting -19%
ComplexSchema -13%
StandardSchema/ComplexSchema -6%
0 regressions past 5%. Numeric/repeated/map/enum suites stay within
noise; their rules were already native.
Cleanups surfaced by the post-merge review: Bytes handler refactor: - Collapse the three parallel WELL_KNOWN_* tables (validSizes, msg, emptyMsg) into one Record<WellKnownKind, …> so the per-kind specs can't drift apart when a new format is added. - Inline the resolved well-known spec into WellKnownRule itself, so eval() no longer keys into lookup tables on every fire. - Collapse BytesConstRule and BytesValRule into one BytesRule type — they had identical shape and were used by const/prefix/suffix/contains. - Bundle the 12 positional constructor args into one BytesRulesConfig options object. Each rule is now `cfg.constRule?`, `cfg.prefix?`, etc. - Drop the `let test: undefined` + try/catch shape — `test` is now a proper local declared inside the try and assigned exactly once. - Inline the `bytesToCelString` one-line shim at the two call sites. - Claim well-known fields on isFieldSet regardless of value: explicit `ip: false` (or any *: false) is a no-op rule but now claims the field, matching `repeated.unique: false` and `float.finite: false`. - Wrap user-supplied regexMatch exceptions in RuntimeError so the CEL behavior is preserved end-to-end when a custom engine throws at match time. The default RegExp engine doesn't throw at match time, so this only matters for opt.regexMatch overrides. - Add a clarifying comment to bytes.contains documenting that an empty needle matches every input (matches Go's bytes.Contains semantics). - Add a comment to bytes.not_in noting the asymmetry with bytes.in: CEL's not_in expression has no size() > 0 guard, but `_ in []` is always false so the behavior is identical. Tests: - Fix the misleading "claimed but emits nothing" comment in the ip=false test — the field IS now claimed under the new convention. Add direct native.validate() assertions to lock in the claim semantics. - Add a bytes.ip 1-byte test confirming it emits `bytes.ip` (not `bytes.ip_empty`). - Add bytes.const empty-rule + empty-input + non-empty-input cases. - Add bytes.pattern with empty pattern (always-pass). - Add bytes.in: [] and bytes.not_in: [] empty-list no-op cases. - Add bytes.contains with empty needle. - Add BytesValue wrapper with absent inner value. - Add a regexMatch-throws test confirming the RuntimeError wrap. Verified: - 923 unit tests pass (+8 new). - Conformance: 2870 / 2 expected skips / 0 fail — unchanged. - Lint, attw, build green. - Bench: 0 regressions vs phase3-baseline.json; phase 3's wins intact (TestByteMatching -84%, WrapperTesting -18%, ComplexSchema -12%).
Each of bytesDescs, enumDescs, repeatedDescs, and mapDescs had exactly
one consumer and one schema, so the indirection was just renaming
*RulesSchema.field as something shorter. Each handler now imports its
schema directly and aliases `.field` locally as `F`:
const F = BytesRulesSchema.field;
...
if (isFieldSet(rules, F.const)) { ... }
rulePath.clone().field(F.const).toPath();
NumericRulesDescs and the per-numeric-type descs stay — there are 12
scalar types sharing one shape via NumericConfig.descs, so the
abstraction earns its keep. boolConstDesc also stays as a one-liner
since it has a meaningful name (just "the field for bool.const") and
no factory/shape to remove.
Net: -65 lines (sites.ts shrinks from 156 to 109; handlers gain ~3-4
chars per field reference but lose an import line).
Verified: 923 unit tests pass, conformance 2870/2 expected skips/0
fail unchanged, lint/attw/build green.
The default regex path eagerly compiles the pattern via `new RegExp(src)` inside `defaultRegexTest`, so an invalid pattern throws at plan time and the surrounding try/catch routes that field back to CEL. The custom `regexMatch` path was lazy: storing `(against) => regexMatch(src, against)` without ever invoking the engine, so the same try/catch had no effect on that branch. Probe `regexMatch(src, "")` at plan time so a user-supplied engine that can't compile the pattern fails fast. The empty string is the contract-safe probe — any regex engine must be able to test an arbitrary pattern against the empty string. If the engine throws, native bails and CEL's own `matches()` call surfaces the failure as a RuntimeError via the same engine. Also corrects the catch comment, which previously claimed CEL produces a CompilationError — both paths actually produce a RuntimeError at eval time, which is what conformance expects. 923 unit tests pass unchanged, conformance 2870/2 expected skips/0 fail unchanged, lint/attw/build green.
Signed-off-by: Jon Bodner <jbodner@buf.build>
…. Update documentation to reflect its current arguments. Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
…idate-es into jbodner/add-benchmarks
Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
… gc settings for different tests. Signed-off-by: Jon Bodner <jbodner@buf.build>
Signed-off-by: Jon Bodner <jbodner@buf.build>
…include EvalStandardRulesCel if there are native rules for every field.
bench.ts accepts --metric cpu|memory|both (default both). cpu skips the heap probe to save ~10-20% per run. The chosen metric is recorded in the JSON output and forwarded to spawned worker processes. checkbench.ts accepts --metric cpu|memory|both (default both) to control which signals can trigger REGRESS markers. cpu gates on mean latency only, memory on heap only, both on mean+heap. Files produced with one metric mode but compared under another emit a warning. Min latency is no longer a regression gate — it remains in the table as an informational signal but never fails the run. Min is too sensitive to JIT warmth and per-process scheduling to be a reliable gate; anything genuinely regressed shows up in mean as well.
…include EvalStandardRulesCel if there are native rules for every field.
timostamm
left a comment
There was a problem hiding this comment.
The dispatch for numeric types is challenging. I don't think there is a perfectly clean solution in TypeScript, but I do see more options for significant weight-shedding. I suggest to combine all three branches. It'll be a bigger diff, but minimize the churn. LMK if that works for you.
| // Eval is invariant in its parameter; the cast is safe because every | ||
| // ScalarValue is also a valid ReflectMessageGet at runtime. |
There was a problem hiding this comment.
It looks like the cast isn't necessary at all.
| ): ScalarNativeResult | undefined { | ||
| const { rules, rulePath, forMapKey } = input; | ||
| if (rules.$typeName === BoolRulesSchema.typeName) { | ||
| return tryBuildNativeBoolRules(rules as BoolRules, rulePath, forMapKey); |
There was a problem hiding this comment.
Same here, as BoolRules is not needed.
| * (NaN bound, unknown extensions, no fields set). | ||
| */ | ||
| export function tryBuildNativeNumericRules( | ||
| rules: Message<string>, |
There was a problem hiding this comment.
All casts in the body can go away if you accept the union of all numeric rule messages (FloatRules | DoubleRules | Int32Rules | ...) here, because then the compiler can narrow down via the switch on $typeName, which becomes the discriminator for the type union.
It may feel strange to spell out each type, but you get better static analysis, and they cost nothing at runtime, and don't add to the bundle size.
I took a peek at the follow-up branches. You'll need to add explicit cases for the unhandled types (AnyRules, etc.) in tryBuildNative so that they don't fall into the branch that calls tryBuildNative, but I actually think that's a plus because it's documenting the gap.
There's additional work to add as well to support strings. I can add them all to this PR. |
…odner/native-rules-phase-1-numeric-bool
…-rules-phase-1-numeric-bool
…st code. update README
…m:bufbuild/protovalidate-es into jbodner/native-rules-phase-1-numeric-bool
|
all of the native rules are now on this branch, including string support. I need to go through your comments still. |
This PR provides native implementation of the protovalidate numeric and boolean rules. Performance should be about 80-90% faster on fields of those types.