fix[next]: extend fingerprinting to cover all relevant cache state via structural hashing#2648
fix[next]: extend fingerprinting to cover all relevant cache state via structural hashing#2648egparedes wants to merge 56 commits into
Conversation
…g in fingerprinters - Add `fingerprint` (alias for `sorting_sets_fingerprinter`) and `versioned_fingerprint` (includes BUILD_CACHE_VERSION_ID) to `utils.py` - Fix `skipping_fields_node_fingerprinter` to pass the reducer dict as a positional arg (not keyword) to `CustomPicklingFingerprinter.from_reducers` - Fix `custom_overriden_pickler` in `eve/utils.py` to use `pickle._Pickler` (pure Python) instead of the C-extension `pickle.Pickler` so that `reducer_override` is called for built-in types like `dict` and `set` (the C-extension fast path bypasses `reducer_override` for built-in types) - Update `test_cached_with_hashing` to use a module-level function instead of a lambda (lambdas can't be pickled, now that `cache_key` fingerprints `self`) - Add tests for `fingerprint`, `versioned_fingerprint`, and `cache_key` behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on stage classes Re-add the `fingerprinter` module-level alias (= `semantic_fingerprinter`) and the `fingerprint` computed property on all four stage dataclasses (`DSLFieldOperatorDef`, `FOASTOperatorDef`, `DSLProgramDef`, `PASTProgramDef`). These were removed when the old `FingerprintedABC`/`FingerprintedMixin` system was dropped in the 'More refactoring' commit but are still needed by existing tests and callers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-add the `fingerprint()` method to the `ir.Node` base class that was removed along with the `FingerprintedABC`/`FingerprintedMixin` system. The method is needed by: - `ffront.lowering_utils` (uses `itir.Expr.fingerprint()` to generate unique variable names) - `ffront.foast_to_gtir` (uses `itir.Expr.fingerprint()` to generate unique SymRef names for conditionals) - `iterator_tests/test_ir.py` (tests that fingerprinting ignores location/type) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NumPy ufuncs (e.g. np.cbrt in DSL closure variables) only pickle through their copyreg.dispatch_table reducer; calling __reduce_ex__ directly raises. Consult the dispatch table first, like pickle.Pickler does.
Propagate the rename through docstrings, the tree_cata doctest, tests and ADR 0023; cache _class_tag with functools.cache.
atom_reducer/composite_reducer/cycle_reducer (formerly leaf_alg/node_alg/ cycle_alg), with matching parameter and prose updates in tests and ADR.
Extractor (formerly ObjectDecomposer), CompositeContent (formerly ObjectDecomposition), AtomicContent (formerly DecompositionAtom), AtomicReducer (formerly AtomReducer), with matching extract/atomic_reducer parameter names and prose updates in docstrings, tests and ADR 0023.
make_fingerprinter() now takes an Extractor instead of a mapping of per-type overrides; the new make_extractor(overrides) builds extractors from the default rules, and the module-level 'extract' is the default Extractor. skipping_fields_node_fingerprinter returns extractor overrides (return_extractors=) for composition via make_extractor.
Deconstructor (formerly Extractor), Deconstruction (formerly CompositeContent), EmptyDeconstruction (formerly AtomicContent), Collapser (formerly AtomicReducer), CompositeCollapser (formerly CompositeReducer), with matching deconstruct/collapser parameter and factory names (make_deconstructor, return_deconstructors, ...). Also reconcile with the reshaped driver API: cycles are now collapsed by the regular collapser as back-reference EmptyDeconstructions when allow_cycles=True (the CycleCollapser alias and the fingerprint cycle collapser are gone); fingerprint digests are unchanged.
Deconstruction becomes the base Collection of deconstructed pieces with EmptyDeconstruction (terminal, no pieces) and OrderInsensitiveDeconstruction (pieces collapse in canonical order) subclasses, replacing the 'ordered' flag; builders (from_pieces, from_typed_value, from_reference) are the construction API and all call sites, tests, docstrings and ADR 0023 adopt the pieces vocabulary. Fingerprint digests are unchanged.
- reduce_object takes a single 'collapser' (renamed param 'deconstructor'):
composites are collapsed by re-wrapping the already-collapsed piece
results via dataclasses.replace, and the driver canonicalizes the order
of OrderInsensitiveDeconstruction results itself.
- The fingerprint digest scheme is unified accordingly
(xxh64('node' + state + 'pieces' + digests) for empty and composite
alike), intentionally changing all fingerprint values (no migration
needed, stale cache keys are simply never hit again).
- make_fingerprinter is replaced by functools.partial composition;
make_deconstructor gains a 'fallback' parameter and fingerprinters use
fingerprint_fallback, which layers the gt4py_metadata(fingerprint=False)
opt-out over the default dataclass/datamodel field deconstruction.
Dispatching dataclasses/datamodels through virtual ABC registry keys
(xtyping.DataclassABC / new datamodels.DataModelABC) was attempted but
breaks stdlib singledispatch MRO composition for eve's generic
datamodel classes (RuntimeError: Inconsistent hierarchy), so they are
handled in the fallbacks instead.
- New eve datamodels.DataModelABC (virtual ABC via subclass hook) with
unit test.
Also fix the import-order NameError reintroduced by moving the public deconstructor builders above their implementation dependencies.
havogt
left a comment
There was a problem hiding this comment.
Automated code review (Claude, high effort: 7 finder angles, per-finding adversarial verification with reproduction scripts). 10 findings, ranked most severe first as inline comments.
The two most consequential findings pull in opposite directions and suggest the location-handling changed by accident, not design: the ffront-stage cache keys became location-insensitive (cross-file cache aliasing) while the ITIR-level persistent cache keys became location-sensitive (permanent cache misses on unrelated edits). upstream/main had exactly the opposite arrangement in both places.
🤖 Generated with Claude Code
| lambda o: workflow.CachedStep( | ||
| o.bare_translation, | ||
| hash_function=stages.fingerprint_compilable_program, | ||
| key_function=utils.stable_fingerprinter, |
There was a problem hiding this comment.
[1/10] Persistent translation cache key now includes location and type node fields
This CachedStep (backed by a persistent FileCache) switched from the location/type-skipping fingerprint_compilable_program to plain utils.stable_fingerprinter, which hashes every ITIR node's SourceLocation (absolute file path, line, column) and lazily-inferred type field. Same issue in dace/workflow/factory.py:50.
Verified empirically: stable_fingerprinter(SymRef(id='x', location=SourceLocation(1, 1, '/path/a.py'))) differs from the same node with a different path, while the old key was equal.
Impact: with GT4PY_BUILD_CACHE_LIFETIME=persistent (e.g. icon4py), inserting a blank line above a stencil, moving the checkout, or sharing the cache dir across paths changes every key → guaranteed miss, full C++/SDFG recompilation, unbounded cache growth. The skipping_fields_node_fingerprinter('location', 'type') deconstructors already exist (via return_deconstructors=True) and could be composed into this key function.
| #: Fingerprinter for the frontend stages: skips source locations on AST nodes | ||
| #: and fingerprints DSL definition functions by their source code and closure | ||
| #: variables (instead of by qualified name). | ||
| semantic_fingerprinter: utils.Fingerprinter = functools.partial( |
There was a problem hiding this comment.
[2/10] FOAST/PAST cache keys became location-insensitive → cached lowering shared across files with wrong SourceLocations
semantic_fingerprinter skips location on all FOAST/PAST nodes. This inverts the invariant the deleted test_stages.py tests asserted (test_fingerprint_stage_foast_op_def / test_fingerprint_stage_past_def: same code at different locations must fingerprint differently).
Verified empirically: two field operators with byte-identical source in mod_one.py and mod_two.py produce equal FOAST-stage fingerprints. The process-wide foast_to_gtir/past_to_itir CachedSteps then return the first operator's cached lowering — with mod_one.py SourceLocations baked in by PreserveLocationVisitor — for the second operator, so its error messages, warnings, and dace debug info point at the wrong file/line.
| return target is obj | ||
|
|
||
|
|
||
| def _reference_by_fully_qualified_name(obj: Any) -> str: |
There was a problem hiding this comment.
[3/10] Regression: locally-defined FrozenNamespace subclass instances in closure vars now crash at decoration
The old fingerprint path (add_content_to_fingerprint) had a never-failing str(obj) fallback. Now _reference_by_fully_qualified_name raises TypeError for locally-defined classes reachable via __reduce_ex__.
Verified empirically against both venvs: a field operator whose closure vars hold a locally-defined eve.utils.FrozenNamespace subclass instance — a frontend-valid ConstantPythonNamespaceObject (type_translation.py:262) and the icon4py constants idiom — crashes at @field_operator decoration with TypeError: Objects which are not importable under their qualified name ('__main__.....<locals>.Consts') ... cannot be safely referenced, while upstream/main decorates and compiles fine. Module-level namespaces, enums, and scalars are unaffected — narrow, but a hard crash before compilation.
| ) | ||
| device_type = core_defs.DeviceType.CPU | ||
| hash_function = stages.compilation_hash | ||
| key_function = utils.stable_fingerprinter |
There was a problem hiding this comment.
[4/10] Executor cache key content-hashes full neighbor tables (D2H copy for CuPy)
The replaced stages.compilation_hash hashed offset_provider by identity via common.hash_offset_provider_items_by_id (~1.2 µs, O(1) in mesh size). stable_fingerprinter has no ndarray deconstructor, so connectivity arrays inside CompileTimeArgs.offset_provider fall through to __reduce_ex__(2) = full data bytes: measured ~2.9 ms per 1M-entry int32 table (~2400× slower). For CuPy arrays __reduce__ calls .get(), i.e. a synchronous device-to-host copy. Same in dace/workflow/backend.py:52.
Impact: once per program variant on the main CompiledProgramsPool path (compile/startup cost), but per-invocation on the iterator/runtime.py:92 fendef dispatcher and FieldOperatorFromFoast.__call__ (decorator.py:715) paths — every call pays a multi-MB mesh re-hash plus a D2H transfer on GPU.
| #: developers or advanced users when making changes requiring forced | ||
| #: compatibility with previously cached builds. | ||
| BUILD_CACHE_VERSION_ID: str = ( | ||
| os.environ.get("BUILD_CACHE_VERSION_ID") or _get_build_cache_version_id() |
There was a problem hiding this comment.
[5/10] Env var missing the GT4PY_ prefix
Every other env var in this file uses the GT4PY_ prefix (GT4PY_BUILD_CACHE_DIR, GT4PY_BUILD_CACHE_LIFETIME, GT4PY_BUILD_JOBS, ...); this one reads bare BUILD_CACHE_VERSION_ID.
A user following the convention sets GT4PY_BUILD_CACHE_VERSION_ID and is silently ignored; conversely an unrelated CI/HPC tool exporting the generic name BUILD_CACHE_VERSION_ID silently pins or perturbs every gt4py cache key — including reusing stale binaries across gt4py upgrades, exactly what this salt exists to prevent. (The ADR documents the unprefixed name too and would need updating.)
| __datamodel_params__: ClassVar[utils.FrozenNamespace[Any]] | ||
|
|
||
| @classmethod | ||
| def __subclasshook__(cls, subclass: type) -> bool: |
There was a problem hiding this comment.
[6/10] __subclasshook__ returns False instead of NotImplemented, breaking the ABC hook contract
The hook ignores cls and never returns NotImplemented. Verified empirically:
- for
class MyBase(DataModelABC),issubclass(AnyUnrelatedDatamodel, MyBase)returnsTrue(hook inherited,clsignored); - a genuine direct subclass
class X(DataModelABC)failsissubclass(X, DataModelABC)because returningFalsesuppresses the normal MRO fallback thatNotImplementedwould allow.
Correct pattern:
if cls is DataModelABC:
return is_datamodel(subclass)
return NotImplemented(Note: DataModelABC currently has no production callers — see the dead-infrastructure comment on eve/utils.py.)
| bytearray: lambda obj: EmptyDeconstruction.from_typed_value(type(obj), bytes(obj)), | ||
| tuple: lambda obj: Deconstruction.from_pieces(*obj, state=_class_obj_tag(obj)), | ||
| list: lambda obj: Deconstruction.from_pieces(*obj, state=_class_obj_tag(obj)), | ||
| dict: lambda obj: Deconstruction.from_pieces( |
There was a problem hiding this comment.
[7/10] dict/list subclass instance state silently dropped by MRO dispatch
singledispatch walks the MRO, so this dict entry also captures all dict subclasses, hashing only the class tag + items() and bypassing the __reduce_ex__ fallback that would capture extra instance state.
Verified empirically: two instances of a dict subclass with equal items but different extra-attribute values produce identical fingerprints (same for list subclasses). Reachable via closure_vars: dict[str, Any] in ffront stages or a user-supplied offset_provider mapping → false cache hit returns an artifact compiled for the other object.
| step: Workflow[StartT, EndT] | ||
| hash_function: Callable[[StartT], HashT] = dataclasses.field(default=hash) # type: ignore[assignment] | ||
| cache: OpaqueMutableMapping[HashT, EndT] = dataclasses.field(repr=False, default_factory=dict) | ||
| key_function: Callable[[StartT], HashT] = dataclasses.field( |
There was a problem hiding this comment.
[8/10] Breaking CachedStep API change with no deprecation path
hash_function (default=hash) was renamed to key_function and is now required, and cache_key fingerprints self, so steps containing lambdas/closures raise at first call (a test removed by this PR explicitly used step=lambda inp: [*inp, 1]).
Verified empirically:
CachedStep(step=tuple)→TypeError: missing ... 'key_function'CachedStep(step=tuple, hash_function=str)→ unexpected keywordCachedStep(step=lambda inp: [*inp, 1], key_function=str)([1])→TypeError: not importable under their qualified name
The new docstring and WorkflowPatterns.md document the restriction, so this is clearly intentional — but downstream code (icon4py-style custom workflows, the pre-PR WorkflowPatterns.md recipe) breaks at construction or first call with no deprecation period.
| return cast(xtyping.SingleDispatchCallable[P, T], result) | ||
|
|
||
|
|
||
| def merge_dispatchers( |
There was a problem hiding this comment.
[9/10] Dead-on-arrival eve infrastructure (3 items)
Verified by grep — each is referenced only by its own new unit tests, with zero production callers:
merge_dispatchers(here, ~30 lines + 3 tests) —ffront/stages.py:59composes deconstructors via plain dict merging intomake_deconstructor({**foast.semantic_fingerprint_deconstructors, ...})instead;DataModelABC(datamodels/core.py:104) —next/utils.pydeliberately avoids it with an explicit comment that virtual-ABC dispatch breakssingledispatch, and callsdatamodels.is_datamodeldirectly;@runtime_checkableonSingleDispatchCallable(extended_typing.py:228) — noisinstance/issubclasscheck against the protocol exists anywhere; the pre-existingis_single_dispatch_callabletypeguard uses getattr duck-typing.
Same theme, smaller: the empty BaseStage marker class (ffront/stages.py:37) and the fingerprinter = semantic_fingerprinter alias (stages.py:72) are referenced nowhere / tests-only.
| _DECONSTRUCT_PICKLE_REDUCE_PROTOCOL: Final[int] = 2 | ||
|
|
||
|
|
||
| def _dataclass_fields(cls: type) -> Optional[tuple[Any, ...]]: |
There was a problem hiding this comment.
[10/10] Avoidable per-node costs on the fingerprinting path
Measured 8.2 ms per stable_fingerprinter call on a 2404-node ITIR tree (~3.4 µs/node), re-paid several times across the chained CachedSteps per program even on warm-disk-cache startup — noticeable for sessions compiling many programs (icon4py). Profile attribution:
_dataclass_fieldsrecomputesdataclasses.fields()/datamodels.get_fields()per object instead of caching per type, andfingerprint_fallbackre-filters field metadata per instance (~27% — the old code cached this with@functools.cacheon_get_metadata_based_state_getstate);- digests round-trip through
hexdigest()strings re-encoded with.encode('ascii')per edge (~13%; rawdigest()bytes sort fine for the order-insensitive case, only the root needs hex); catabolizeusesdataclasses.replaceper internal node (~8%; direct construction skips the field-introspection machinery);CachedStep.cache_keyre-fingerprints the constant frozenselfon every call instead of once per instance (catabolize's memo dict is call-local).
Address review findings on the structural-fingerprinting refactor: - Restore dedicated cache key functions (compilation_hash, fingerprint_compilable_program) for the gtfn/dace executor and persistent translation caches, reimplemented on the new machinery: location/type-agnostic program fingerprint, order-sensitive by-id offset providers for the in-memory executor (fixes silent wrong results from reordered offset_provider dicts and avoids content-hashing connectivity tables on every lookup), and location-stable persistent keys. - Fingerprint DSL definition functions by source code only (drop filename/line/column) so textually identical operators match. - Fix DataModelABC.__subclasshook__ to defer non-base checks via NotImplemented. - Rename env var to GT4PY_BUILD_CACHE_VERSION_ID. - Cache CachedStep self-fingerprint instead of re-walking the step graph per lookup. - Unify the three drifted fields-deconstruction sites. - Fix WorkflowPatterns.md cached-step example.
…/subclass fingerprinting, perf Round 2, addressing the automated PR review findings: - [2/10] Restore location-SENSITIVE frontend-stage cache keys. The PR made FOAST/PAST stage keys and DSL-definition keys location-insensitive, so two textually identical operators in different files aliased to one cached lowering, baking the first's SourceLocations into the second (wrong error locations). FOAST/PAST node fingerprints now include 'location'; the DSL definition function is fingerprinted by its full SourceDefinition again. Location-agnostic fingerprinting remains for the ITIR persistent cache and var-name generation. Adds a location-sensitivity regression test. - [3/10] Fix hard crash at @field_operator decoration when closure vars hold a locally-defined FrozenNamespace subclass (icon4py constants idiom): register a content-based deconstructor for eve.utils.Namespace instead of falling through to __reduce_ex__, which referenced the non-importable local class. - [7/10] Capture builtin-container *subclass* instance __dict__ state so a dict/ list/set subclass with extra attributes no longer collides with a plain instance (false cache hit). Exact builtins are unaffected. - [10/10] Cheap fingerprinting-hot-path wins: cache _dataclass_fields per type; construct the recombined Deconstruction directly instead of dataclasses.replace. [1/10],[4/10],[5/10],[6/10] were already addressed in the previous commit.
|
Thanks for the thorough automated pass — it caught a real location-handling inversion I'd half-mirrored. Addressed in two commits ( Fixed
Each fix has a unit test; all affected suites are green, mypy/pre-commit clean, and an end-to-end Left as-is (flagging for your call)
🤖 Generated with Claude Code |
…fn cache-key crash) The frontend-stage cache keys fingerprint program-likes (FieldOperator, Program, ...) when they appear in another program's closure variables. The PR removed the dedicated FieldOperator/Program fingerprint registrations, so the whole 'backend' object graph is now traversed. That graph can hold non-importable objects — e.g. the 'unittest.mock.Mock' backend that test_compiled_program swaps in, whose dynamically-created 'Mock' subclass is rejected by the by-qualified-name reference deconstructor — crashing fingerprinting with TypeError (test_compiled_program gtfn variants, internal nomesh CI jobs). Exclude 'backend' from the fingerprint via gt4py_metadata(fingerprint=False): it does not affect the lowering (which is what these stage caches key), the backend-specific compilation is keyed separately in the backend's own caches, and traversing the whole backend graph per cache lookup was wasteful and fragile. Adds a regression test.
| def __getstate__(self) -> dict[str, Any]: | ||
| # Serialize only the dataclass fields, excluding cached properties | ||
| # stored in `__dict__` (which may not be picklable). | ||
| return {f.name: getattr(self, f.name) for f in dataclasses.fields(self)} | ||
|
|
There was a problem hiding this comment.
This was introduced as a fix by claude. Is it needed @egparedes
|
|
||
| from __future__ import annotations | ||
|
|
||
| import collections |
There was a problem hiding this comment.
Maybe the fingerprinting related utils deserve their own module?
| lambda o: workflow.CachedStep( | ||
| o.bare_translation, | ||
| hash_function=stages.fingerprint_compilable_program, | ||
| key_function=stages.fingerprint_compilable_program, |
There was a problem hiding this comment.
The current FileCache key does not include the stage fingerprint. I think it should be added, question is whether to do it now or in a separate PR.
There was a problem hiding this comment.
Another comment about this aspect:
[1/10] persistent ITIR key includes location/type — the gtfn/dace translation FileCache now keys on a restored stages.fingerprint_compilable_program built on itir.semantic_fingerprinter (skips location/type), so unrelated edits / checkout moves no longer bust the persistent cache.
In my opinion, the FileCache key should include the type: the translated program should not clash in disk cache for different types.
| - The OTF build cache (`CachedStep`) keys compiled artifacts by a fingerprint | ||
| of the workflow step **and** its input, so a cached compiled program can be | ||
| reused only when both are unchanged. |
There was a problem hiding this comment.
This text does not match the key of the disk cache in translation stage (see my other comments).
Summary
Generalizes and unifies the fingerprinting / cache-key infrastructure used by the
gt4py.nextworkflow caches, so cache keys are deterministic across interpreter runs and get invalidated when any relevant state changes — the cached step's own configuration and the gt4py version, not just the step input.Supersedes #2609, keeping its caching policy (
cache_key = H(BUILD_CACHE_VERSION_ID, step, key_function(input)), default-include with explicit field opt-out) while replacing the pickle-based mechanism with structural Merkle-style hashing, addressing the issues raised in review (RecursionErroron realistic IR depths, local-enum regression, non-orderable dict keys,OrderedDictfalse cache hits, identity-sensitivity, conflated pickling contracts). It incorporates @havogt's proposal egparedes#8 by merge, preserving authorship.The design and rationale are documented in the new ADR
docs/development/ADRs/next/0023-Fingerprinting.md.Key changes
gt4py.next.utils), separating the traversal scheme from the reduction logic:EmptyDeconstruction/Deconstruction, dispatched on the MRO (eve.utils.singledispatcher); defaults cover class-tagged primitives, ordered tuples/lists, digest-sorted dicts/sets (unsortable keys just work), enums by member content, types/functions/modules by identity-verified qualified name, dataclasses/datamodels by fields honoringgt4py_metadata(fingerprint=False), and a__reduce_ex__fallback that honorscopyreg.dispatch_tablereducers (e.g. NumPy ufuncs).catabolize, a generic iterative post-order fold with identity-based memoization, cycle back-references (allow_cycles) and canonical ordering of order-insensitive pieces — no recursion limit (a depth-5000 ITIR chain fingerprints in ~120 ms; the pickle-based mechanism failed at depth ~74), and fingerprints are a pure function of value, not object identity.catabolizepartially applied to a deconstructor and this collapser.CachedSteprenameshash_function→key_function, excludeskey_function/cachefrom its own fingerprint, and keys onstable_fingerprinter((BUILD_CACHE_VERSION_ID, self, key_function(inp))). Newconfig.BUILD_CACHE_VERSION_ID(defaults to the gt4py version, env-overridable) salts the cache. Runners (gtfn,dace, ...) adoptstable_fingerprinteras theirkey_function.MetadataBasedPicklingMixin/ custom-pickler machinery is removed; classes keep standard serialization. The ffront/ITIRsemantic_fingerprinters are deconstructor compositions (DSL definition functions fingerprinted by source + closure variables).WorkflowPatterns.mdupdated; unit tests for thecatabolizedriver, the fingerprint deconstructors, andCachedStep.Testing
tests/next_tests/unit_tests+tests/eve_tests: 1879 + 362 passed (only pre-existing GPU/CuPy environment parametrizations fail on a machine without CuPy).tests/next_tests/integration_tests -k roundtrip: green, includingtest_constant_closure_vars_with_enums(local enums in closure variables, which fails with the pickle-based mechanism) and thecbrt/test_rootsufunc cases.RecursionErrorrepro from the fix[next]: extend fingerprinting mechanism to consider all relevant state in caches #2609 review passes; cross-process determinism verified.pre-commit(ruff, mypy, tach) green on all changed files.