fix[next]: extend fingerprinting to cover all relevant cache state via structural hashing by egparedes · Pull Request #2648 · GridTools/gt4py

egparedes · 2026-06-10T16:57:29Z

Summary

Generalizes and unifies the fingerprinting / cache-key infrastructure used by the gt4py.next workflow caches, so cache keys are deterministic across interpreter runs and get invalidated when any relevant state changes — the cached step's own configuration and the gt4py version, not just the step input.

Supersedes #2609, keeping its caching policy (cache_key = H(BUILD_CACHE_VERSION_ID, step, key_function(input)), default-include with explicit field opt-out) while replacing the pickle-based mechanism with structural Merkle-style hashing, addressing the issues raised in review (RecursionError on realistic IR depths, local-enum regression, non-orderable dict keys, OrderedDict false cache hits, identity-sensitivity, conflated pickling contracts). It incorporates @havogt's proposal egparedes#8 by merge, preserving authorship.

The design and rationale are documented in the new ADR docs/development/ADRs/next/0023-Fingerprinting.md.

Key changes

Structural fingerprinter as a layered catamorphism (gt4py.next.utils), separating the traversal scheme from the reduction logic:
- Deconstruction: per-type deconstructors peel one level into EmptyDeconstruction / Deconstruction, dispatched on the MRO (eve.utils.singledispatcher); defaults cover class-tagged primitives, ordered tuples/lists, digest-sorted dicts/sets (unsortable keys just work), enums by member content, types/functions/modules by identity-verified qualified name, dataclasses/datamodels by fields honoring gt4py_metadata(fingerprint=False), and a __reduce_ex__ fallback that honors copyreg.dispatch_table reducers (e.g. NumPy ufuncs).
- Traversal: catabolize, a generic iterative post-order fold with identity-based memoization, cycle back-references (allow_cycles) and canonical ordering of order-insensitive pieces — no recursion limit (a depth-5000 ITIR chain fingerprints in ~120 ms; the pickle-based mechanism failed at depth ~74), and fingerprints are a pure function of value, not object identity.
- Reduction: a single fingerprint collapser (the catamorphism's algebra) hashing deconstructions uniformly with xxh64 and domain separation; fingerprinters are catabolize partially applied to a deconstructor and this collapser.
Cache keys: CachedStep renames hash_function → key_function, excludes key_function/cache from its own fingerprint, and keys on stable_fingerprinter((BUILD_CACHE_VERSION_ID, self, key_function(inp))). New config.BUILD_CACHE_VERSION_ID (defaults to the gt4py version, env-overridable) salts the cache. Runners (gtfn, dace, ...) adopt stable_fingerprinter as their key_function.
Fingerprinting no longer interferes with real pickling: the MetadataBasedPicklingMixin / custom-pickler machinery is removed; classes keep standard serialization. The ffront/ITIR semantic_fingerprinters are deconstructor compositions (DSL definition functions fingerprinted by source + closure variables).
Docs / tests: ADR 0023 (pickle- and optree-based designs recorded under alternatives considered); WorkflowPatterns.md updated; unit tests for the catabolize driver, the fingerprint deconstructors, and CachedStep.

Testing

tests/next_tests/unit_tests + tests/eve_tests: 1879 + 362 passed (only pre-existing GPU/CuPy environment parametrizations fail on a machine without CuPy).
tests/next_tests/integration_tests -k roundtrip: green, including test_constant_closure_vars_with_enums (local enums in closure variables, which fails with the pickle-based mechanism) and the cbrt/test_roots ufunc cases.
The depth-80 ITIR RecursionError repro from the fix[next]: extend fingerprinting mechanism to consider all relevant state in caches #2609 review passes; cross-process determinism verified.
pre-commit (ruff, mypy, tach) green on all changed files.

…g in fingerprinters - Add `fingerprint` (alias for `sorting_sets_fingerprinter`) and `versioned_fingerprint` (includes BUILD_CACHE_VERSION_ID) to `utils.py` - Fix `skipping_fields_node_fingerprinter` to pass the reducer dict as a positional arg (not keyword) to `CustomPicklingFingerprinter.from_reducers` - Fix `custom_overriden_pickler` in `eve/utils.py` to use `pickle._Pickler` (pure Python) instead of the C-extension `pickle.Pickler` so that `reducer_override` is called for built-in types like `dict` and `set` (the C-extension fast path bypasses `reducer_override` for built-in types) - Update `test_cached_with_hashing` to use a module-level function instead of a lambda (lambdas can't be pickled, now that `cache_key` fingerprints `self`) - Add tests for `fingerprint`, `versioned_fingerprint`, and `cache_key` behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…on stage classes Re-add the `fingerprinter` module-level alias (= `semantic_fingerprinter`) and the `fingerprint` computed property on all four stage dataclasses (`DSLFieldOperatorDef`, `FOASTOperatorDef`, `DSLProgramDef`, `PASTProgramDef`). These were removed when the old `FingerprintedABC`/`FingerprintedMixin` system was dropped in the 'More refactoring' commit but are still needed by existing tests and callers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Re-add the `fingerprint()` method to the `ir.Node` base class that was removed along with the `FingerprintedABC`/`FingerprintedMixin` system. The method is needed by: - `ffront.lowering_utils` (uses `itir.Expr.fingerprint()` to generate unique variable names) - `ffront.foast_to_gtir` (uses `itir.Expr.fingerprint()` to generate unique SymRef names for conditionals) - `iterator_tests/test_ir.py` (tests that fingerprinting ignores location/type) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NumPy ufuncs (e.g. np.cbrt in DSL closure variables) only pickle through their copyreg.dispatch_table reducer; calling __reduce_ex__ directly raises. Consult the dispatch table first, like pickle.Pickler does.

Propagate the rename through docstrings, the tree_cata doctest, tests and ADR 0023; cache _class_tag with functools.cache.

atom_reducer/composite_reducer/cycle_reducer (formerly leaf_alg/node_alg/ cycle_alg), with matching parameter and prose updates in tests and ADR.

Extractor (formerly ObjectDecomposer), CompositeContent (formerly ObjectDecomposition), AtomicContent (formerly DecompositionAtom), AtomicReducer (formerly AtomReducer), with matching extract/atomic_reducer parameter names and prose updates in docstrings, tests and ADR 0023.

make_fingerprinter() now takes an Extractor instead of a mapping of per-type overrides; the new make_extractor(overrides) builds extractors from the default rules, and the module-level 'extract' is the default Extractor. skipping_fields_node_fingerprinter returns extractor overrides (return_extractors=) for composition via make_extractor.

…builder

Deconstructor (formerly Extractor), Deconstruction (formerly CompositeContent), EmptyDeconstruction (formerly AtomicContent), Collapser (formerly AtomicReducer), CompositeCollapser (formerly CompositeReducer), with matching deconstruct/collapser parameter and factory names (make_deconstructor, return_deconstructors, ...). Also reconcile with the reshaped driver API: cycles are now collapsed by the regular collapser as back-reference EmptyDeconstructions when allow_cycles=True (the CycleCollapser alias and the fingerprint cycle collapser are gone); fingerprint digests are unchanged.

Deconstruction becomes the base Collection of deconstructed pieces with EmptyDeconstruction (terminal, no pieces) and OrderInsensitiveDeconstruction (pieces collapse in canonical order) subclasses, replacing the 'ordered' flag; builders (from_pieces, from_typed_value, from_reference) are the construction API and all call sites, tests, docstrings and ADR 0023 adopt the pieces vocabulary. Fingerprint digests are unchanged.

- reduce_object takes a single 'collapser' (renamed param 'deconstructor'): composites are collapsed by re-wrapping the already-collapsed piece results via dataclasses.replace, and the driver canonicalizes the order of OrderInsensitiveDeconstruction results itself. - The fingerprint digest scheme is unified accordingly (xxh64('node' + state + 'pieces' + digests) for empty and composite alike), intentionally changing all fingerprint values (no migration needed, stale cache keys are simply never hit again). - make_fingerprinter is replaced by functools.partial composition; make_deconstructor gains a 'fallback' parameter and fingerprinters use fingerprint_fallback, which layers the gt4py_metadata(fingerprint=False) opt-out over the default dataclass/datamodel field deconstruction. Dispatching dataclasses/datamodels through virtual ABC registry keys (xtyping.DataclassABC / new datamodels.DataModelABC) was attempted but breaks stdlib singledispatch MRO composition for eve's generic datamodel classes (RuntimeError: Inconsistent hierarchy), so they are handled in the fallbacks instead. - New eve datamodels.DataModelABC (virtual ABC via subclass hook) with unit test.

Also fix the import-order NameError reintroduced by moving the public deconstructor builders above their implementation dependencies.

havogt

Automated code review (Claude, high effort: 7 finder angles, per-finding adversarial verification with reproduction scripts). 10 findings, ranked most severe first as inline comments.

The two most consequential findings pull in opposite directions and suggest the location-handling changed by accident, not design: the ffront-stage cache keys became location-insensitive (cross-file cache aliasing) while the ITIR-level persistent cache keys became location-sensitive (permanent cache misses on unrelated edits). upstream/main had exactly the opposite arrangement in both places.

🤖 Generated with Claude Code

havogt · 2026-06-11T16:42:29Z

                lambda o: workflow.CachedStep(
                    o.bare_translation,
-                    hash_function=stages.fingerprint_compilable_program,
+                    key_function=utils.stable_fingerprinter,


[1/10] Persistent translation cache key now includes location and type node fields

This CachedStep (backed by a persistent FileCache) switched from the location/type-skipping fingerprint_compilable_program to plain utils.stable_fingerprinter, which hashes every ITIR node's SourceLocation (absolute file path, line, column) and lazily-inferred type field. Same issue in dace/workflow/factory.py:50.

Verified empirically: stable_fingerprinter(SymRef(id='x', location=SourceLocation(1, 1, '/path/a.py'))) differs from the same node with a different path, while the old key was equal.

Impact: with GT4PY_BUILD_CACHE_LIFETIME=persistent (e.g. icon4py), inserting a blank line above a stencil, moving the checkout, or sharing the cache dir across paths changes every key → guaranteed miss, full C++/SDFG recompilation, unbounded cache growth. The skipping_fields_node_fingerprinter('location', 'type') deconstructors already exist (via return_deconstructors=True) and could be composed into this key function.

havogt · 2026-06-11T16:42:29Z

+#: Fingerprinter for the frontend stages: skips source locations on AST nodes
+#: and fingerprints DSL definition functions by their source code and closure
+#: variables (instead of by qualified name).
+semantic_fingerprinter: utils.Fingerprinter = functools.partial(


[2/10] FOAST/PAST cache keys became location-insensitive → cached lowering shared across files with wrong SourceLocations

semantic_fingerprinter skips location on all FOAST/PAST nodes. This inverts the invariant the deleted test_stages.py tests asserted (test_fingerprint_stage_foast_op_def / test_fingerprint_stage_past_def: same code at different locations must fingerprint differently).

Verified empirically: two field operators with byte-identical source in mod_one.py and mod_two.py produce equal FOAST-stage fingerprints. The process-wide foast_to_gtir/past_to_itir CachedSteps then return the first operator's cached lowering — with mod_one.py SourceLocations baked in by PreserveLocationVisitor — for the second operator, so its error messages, warnings, and dace debug info point at the wrong file/line.

havogt · 2026-06-11T16:42:30Z

+    return target is obj
+
+
+def _reference_by_fully_qualified_name(obj: Any) -> str:


[3/10] Regression: locally-defined FrozenNamespace subclass instances in closure vars now crash at decoration

The old fingerprint path (add_content_to_fingerprint) had a never-failing str(obj) fallback. Now _reference_by_fully_qualified_name raises TypeError for locally-defined classes reachable via __reduce_ex__.

Verified empirically against both venvs: a field operator whose closure vars hold a locally-defined eve.utils.FrozenNamespace subclass instance — a frontend-valid ConstantPythonNamespaceObject (type_translation.py:262) and the icon4py constants idiom — crashes at @field_operator decoration with TypeError: Objects which are not importable under their qualified name ('__main__.....<locals>.Consts') ... cannot be safely referenced, while upstream/main decorates and compiles fine. Module-level namespaces, enums, and scalars are unaffected — narrow, but a hard crash before compilation.

havogt · 2026-06-11T16:42:30Z

        )
        device_type = core_defs.DeviceType.CPU
-        hash_function = stages.compilation_hash
+        key_function = utils.stable_fingerprinter


[4/10] Executor cache key content-hashes full neighbor tables (D2H copy for CuPy)

The replaced stages.compilation_hash hashed offset_provider by identity via common.hash_offset_provider_items_by_id (~1.2 µs, O(1) in mesh size). stable_fingerprinter has no ndarray deconstructor, so connectivity arrays inside CompileTimeArgs.offset_provider fall through to __reduce_ex__(2) = full data bytes: measured ~2.9 ms per 1M-entry int32 table (~2400× slower). For CuPy arrays __reduce__ calls .get(), i.e. a synchronous device-to-host copy. Same in dace/workflow/backend.py:52.

Impact: once per program variant on the main CompiledProgramsPool path (compile/startup cost), but per-invocation on the iterator/runtime.py:92 fendef dispatcher and FieldOperatorFromFoast.__call__ (decorator.py:715) paths — every call pays a multi-MB mesh re-hash plus a D2H transfer on GPU.

havogt · 2026-06-11T16:42:30Z

+#: developers or advanced users when making changes requiring forced
+#: compatibility with previously cached builds.
+BUILD_CACHE_VERSION_ID: str = (
+    os.environ.get("BUILD_CACHE_VERSION_ID") or _get_build_cache_version_id()


[5/10] Env var missing the GT4PY_ prefix

Every other env var in this file uses the GT4PY_ prefix (GT4PY_BUILD_CACHE_DIR, GT4PY_BUILD_CACHE_LIFETIME, GT4PY_BUILD_JOBS, ...); this one reads bare BUILD_CACHE_VERSION_ID.

A user following the convention sets GT4PY_BUILD_CACHE_VERSION_ID and is silently ignored; conversely an unrelated CI/HPC tool exporting the generic name BUILD_CACHE_VERSION_ID silently pins or perturbs every gt4py cache key — including reusing stale binaries across gt4py upgrades, exactly what this salt exists to prevent. (The ADR documents the unprefixed name too and would need updating.)

havogt · 2026-06-11T16:42:30Z

+    __datamodel_params__: ClassVar[utils.FrozenNamespace[Any]]
+
+    @classmethod
+    def __subclasshook__(cls, subclass: type) -> bool:


[6/10] __subclasshook__ returns False instead of NotImplemented, breaking the ABC hook contract

The hook ignores cls and never returns NotImplemented. Verified empirically:

for class MyBase(DataModelABC), issubclass(AnyUnrelatedDatamodel, MyBase) returns True (hook inherited, cls ignored);

a genuine direct subclass class X(DataModelABC) fails issubclass(X, DataModelABC) because returning False suppresses the normal MRO fallback that NotImplemented would allow.

Correct pattern:

if cls is DataModelABC: return is_datamodel(subclass) return NotImplemented

(Note: DataModelABC currently has no production callers — see the dead-infrastructure comment on eve/utils.py.)

havogt · 2026-06-11T16:42:30Z

+    bytearray: lambda obj: EmptyDeconstruction.from_typed_value(type(obj), bytes(obj)),
+    tuple: lambda obj: Deconstruction.from_pieces(*obj, state=_class_obj_tag(obj)),
+    list: lambda obj: Deconstruction.from_pieces(*obj, state=_class_obj_tag(obj)),
+    dict: lambda obj: Deconstruction.from_pieces(


[7/10] dict/list subclass instance state silently dropped by MRO dispatch

singledispatch walks the MRO, so this dict entry also captures all dict subclasses, hashing only the class tag + items() and bypassing the __reduce_ex__ fallback that would capture extra instance state.

Verified empirically: two instances of a dict subclass with equal items but different extra-attribute values produce identical fingerprints (same for list subclasses). Reachable via closure_vars: dict[str, Any] in ffront stages or a user-supplied offset_provider mapping → false cache hit returns an artifact compiled for the other object.

havogt · 2026-06-11T16:42:30Z

    step: Workflow[StartT, EndT]
-    hash_function: Callable[[StartT], HashT] = dataclasses.field(default=hash)  # type: ignore[assignment]
-    cache: OpaqueMutableMapping[HashT, EndT] = dataclasses.field(repr=False, default_factory=dict)
+    key_function: Callable[[StartT], HashT] = dataclasses.field(


[8/10] Breaking CachedStep API change with no deprecation path

hash_function (default=hash) was renamed to key_function and is now required, and cache_key fingerprints self, so steps containing lambdas/closures raise at first call (a test removed by this PR explicitly used step=lambda inp: [*inp, 1]).

Verified empirically:

CachedStep(step=tuple) → TypeError: missing ... 'key_function'

CachedStep(step=tuple, hash_function=str) → unexpected keyword

CachedStep(step=lambda inp: [*inp, 1], key_function=str)([1]) → TypeError: not importable under their qualified name

The new docstring and WorkflowPatterns.md document the restriction, so this is clearly intentional — but downstream code (icon4py-style custom workflows, the pre-PR WorkflowPatterns.md recipe) breaks at construction or first call with no deprecation period.

havogt · 2026-06-11T16:42:30Z

+    return cast(xtyping.SingleDispatchCallable[P, T], result)
+
+
+def merge_dispatchers(


[9/10] Dead-on-arrival eve infrastructure (3 items)

Verified by grep — each is referenced only by its own new unit tests, with zero production callers:

merge_dispatchers (here, ~30 lines + 3 tests) — ffront/stages.py:59 composes deconstructors via plain dict merging into make_deconstructor({**foast.semantic_fingerprint_deconstructors, ...}) instead;

DataModelABC (datamodels/core.py:104) — next/utils.py deliberately avoids it with an explicit comment that virtual-ABC dispatch breaks singledispatch, and calls datamodels.is_datamodel directly;

@runtime_checkable on SingleDispatchCallable (extended_typing.py:228) — no isinstance/issubclass check against the protocol exists anywhere; the pre-existing is_single_dispatch_callable typeguard uses getattr duck-typing.

Same theme, smaller: the empty BaseStage marker class (ffront/stages.py:37) and the fingerprinter = semantic_fingerprinter alias (stages.py:72) are referenced nowhere / tests-only.

havogt · 2026-06-11T16:42:30Z

+_DECONSTRUCT_PICKLE_REDUCE_PROTOCOL: Final[int] = 2
+
+
+def _dataclass_fields(cls: type) -> Optional[tuple[Any, ...]]:


[10/10] Avoidable per-node costs on the fingerprinting path

Measured 8.2 ms per stable_fingerprinter call on a 2404-node ITIR tree (~3.4 µs/node), re-paid several times across the chained CachedSteps per program even on warm-disk-cache startup — noticeable for sessions compiling many programs (icon4py). Profile attribution:

_dataclass_fields recomputes dataclasses.fields() / datamodels.get_fields() per object instead of caching per type, and fingerprint_fallback re-filters field metadata per instance (~27% — the old code cached this with @functools.cache on _get_metadata_based_state_getstate);

digests round-trip through hexdigest() strings re-encoded with .encode('ascii') per edge (~13%; raw digest() bytes sort fine for the order-insensitive case, only the root needs hex);

catabolize uses dataclasses.replace per internal node (~8%; direct construction skips the field-introspection machinery);

CachedStep.cache_key re-fingerprints the constant frozen self on every call instead of once per instance (catabolize's memo dict is call-local).

Address review findings on the structural-fingerprinting refactor: - Restore dedicated cache key functions (compilation_hash, fingerprint_compilable_program) for the gtfn/dace executor and persistent translation caches, reimplemented on the new machinery: location/type-agnostic program fingerprint, order-sensitive by-id offset providers for the in-memory executor (fixes silent wrong results from reordered offset_provider dicts and avoids content-hashing connectivity tables on every lookup), and location-stable persistent keys. - Fingerprint DSL definition functions by source code only (drop filename/line/column) so textually identical operators match. - Fix DataModelABC.__subclasshook__ to defer non-base checks via NotImplemented. - Rename env var to GT4PY_BUILD_CACHE_VERSION_ID. - Cache CachedStep self-fingerprint instead of re-walking the step graph per lookup. - Unify the three drifted fields-deconstruction sites. - Fix WorkflowPatterns.md cached-step example.

…/subclass fingerprinting, perf Round 2, addressing the automated PR review findings: - [2/10] Restore location-SENSITIVE frontend-stage cache keys. The PR made FOAST/PAST stage keys and DSL-definition keys location-insensitive, so two textually identical operators in different files aliased to one cached lowering, baking the first's SourceLocations into the second (wrong error locations). FOAST/PAST node fingerprints now include 'location'; the DSL definition function is fingerprinted by its full SourceDefinition again. Location-agnostic fingerprinting remains for the ITIR persistent cache and var-name generation. Adds a location-sensitivity regression test. - [3/10] Fix hard crash at @field_operator decoration when closure vars hold a locally-defined FrozenNamespace subclass (icon4py constants idiom): register a content-based deconstructor for eve.utils.Namespace instead of falling through to __reduce_ex__, which referenced the non-importable local class. - [7/10] Capture builtin-container *subclass* instance __dict__ state so a dict/ list/set subclass with extra attributes no longer collides with a plain instance (false cache hit). Exact builtins are unaffected. - [10/10] Cheap fingerprinting-hot-path wins: cache _dataclass_fields per type; construct the recombined Deconstruction directly instead of dataclasses.replace. [1/10],[4/10],[5/10],[6/10] were already addressed in the previous commit.

egparedes · 2026-06-11T19:21:07Z

Thanks for the thorough automated pass — it caught a real location-handling inversion I'd half-mirrored. Addressed in two commits (48027da and c7f9b1d). Disposition below.

Fixed

[1/10] persistent ITIR key includes location/type — the gtfn/dace translation FileCache now keys on a restored stages.fingerprint_compilable_program built on itir.semantic_fingerprinter (skips location/type), so unrelated edits / checkout moves no longer bust the persistent cache.
[2/10] ffront keys location-insensitive → cross-file aliasing — confirmed against main's deleted test_stages.py (samecode_fieldop != fieldop). FOAST/PAST node fingerprints now include location again, and the DSL definition is fingerprinted by its full SourceDefinition. Location-agnostic fingerprinting stays where it belongs (ITIR persistent cache + the __val_/__cond_ var-name generation, which use itir.semantic_fingerprinter). Added a test_fingerprinter_is_location_sensitive regression test.
[3/10] local FrozenNamespace subclass crashes at decoration — registered a content-based deconstructor for eve.utils.Namespace, so constant namespaces in closure vars (the icon4py idiom) are fingerprinted by their items instead of falling through to __reduce_ex__ and referencing the non-importable local class.
[4/10] executor content-hashes neighbor tables — restored stages.compilation_hash for the in-memory executor key: location-agnostic program fingerprint + offset providers by identity/order via hash_offset_provider_items_by_id (O(1), no D2H copy).
[5/10] env var prefix — now GT4PY_BUILD_CACHE_VERSION_ID.
[6/10] __subclasshook__ — now if cls is DataModelABC and is_datamodel(subclass): return True; return NotImplemented (genuine subclasses and derived-ABC checks defer to the MRO).
[7/10] container-subclass instance state dropped — dict/list/set/… subclasses now contribute their instance __dict__ to the fingerprint; exact builtins are unchanged.
[10/10] per-node overhead — applied the cheap, safe wins: _dataclass_fields is now cached per type, and catabolize's COMBINE constructs the recombined Deconstruction directly instead of via dataclasses.replace. (Left the hex/encode('ascii') digest round-trip alone for now — it changes the collapser format and the order-insensitive sort, so it felt out of scope for a fix-up.)

Each fix has a unit test; all affected suites are green, mypy/pre-commit clean, and an end-to-end run_gtfn_cached compile + cache-hit checks out.

Left as-is (flagging for your call)

[8/10] breaking CachedStep API — this is the intent of the PR (fingerprinting the step is the new "step state in caches" behavior), so I didn't add a deprecation shim. I did fix the doc recipe and cache the per-instance step fingerprint so it isn't re-walked on every call. Happy to add a transitional hash_function alias if you want a deprecation window.
[9/10] dead infrastructure — merge_dispatchers was deliberately restored earlier in the branch, and DataModelABC is new public API (now with a correct hook), so removing them felt like your decision rather than a review fix-up. Easy to drop them + @runtime_checkable/BaseStage/the fingerprinter alias if you'd prefer.

🤖 Generated with Claude Code

…fn cache-key crash) The frontend-stage cache keys fingerprint program-likes (FieldOperator, Program, ...) when they appear in another program's closure variables. The PR removed the dedicated FieldOperator/Program fingerprint registrations, so the whole 'backend' object graph is now traversed. That graph can hold non-importable objects — e.g. the 'unittest.mock.Mock' backend that test_compiled_program swaps in, whose dynamically-created 'Mock' subclass is rejected by the by-qualified-name reference deconstructor — crashing fingerprinting with TypeError (test_compiled_program gtfn variants, internal nomesh CI jobs). Exclude 'backend' from the fingerprint via gt4py_metadata(fingerprint=False): it does not affect the lowering (which is what these stage caches key), the backend-specific compilation is keyed separately in the backend's own caches, and traversing the whole backend graph per cache lookup was wasteful and fragile. Adds a regression test.

havogt · 2026-06-12T09:02:24Z

+    def __getstate__(self) -> dict[str, Any]:
+        # Serialize only the dataclass fields, excluding cached properties
+        # stored in `__dict__` (which may not be picklable).
+        return {f.name: getattr(self, f.name) for f in dataclasses.fields(self)}
+


This was introduced as a fix by claude. Is it needed @egparedes

havogt · 2026-06-12T09:09:07Z


 from __future__ import annotations

+import collections


Maybe the fingerprinting related utils deserve their own module?

edopao · 2026-06-12T11:34:11Z

                lambda o: workflow.CachedStep(
                    o.bare_translation,
-                    hash_function=stages.fingerprint_compilable_program,
+                    key_function=stages.fingerprint_compilable_program,


The current FileCache key does not include the stage fingerprint. I think it should be added, question is whether to do it now or in a separate PR.

Another comment about this aspect:

[1/10] persistent ITIR key includes location/type — the gtfn/dace translation FileCache now keys on a restored stages.fingerprint_compilable_program built on itir.semantic_fingerprinter (skips location/type), so unrelated edits / checkout moves no longer bust the persistent cache.

In my opinion, the FileCache key should include the type: the translated program should not clash in disk cache for different types.

edopao · 2026-06-12T11:54:02Z

+- The OTF build cache (`CachedStep`) keys compiled artifacts by a fingerprint
+  of the workflow step **and** its input, so a cached compiled program can be
+  reused only when both are unchanged.


This text does not match the key of the disk cache in translation stage (see my other comments).

egparedes and others added 30 commits May 27, 2026 15:25

Refactoring ffront stages with fingerprinted protocol

6d57783

Refactoring workflows and programs/field operators

b21528f

Refactoring executors and other fixes

f3677ad

Add build cache version id

7529251

WIP

55e10b3

More refactoring

85f381e

Merge branch 'main' into fix/add-step-state-to-caches

d2ea039

Fixes for failing pickle cases and small cleaning up refactorings.

2d7b483

More fixes and cleanups

9cce321

Remove .fingerprint method/property

0e03f58

Merge branch 'main' into fix/add-step-state-to-caches

3dada50

Rename default fingerprinters

60fd5e9

Test fixes

854450e

Merge branch 'main' into fix/add-step-state-to-caches

99c0897

Address review comments

436e3ff

Address more review comments

25e38ef

More review fixes

d1a15ee

WIP refactoring of the pickler and fingerprinting helpers

06f3872

Fixes for eve/utils

cdaa064

Fixes for next/utils

db01ecc

Fix ffront stages fingerprinting

9c3a0e2

Fix typings

df7b53b

Merge branch 'main' into fix/add-step-state-to-caches

497d7b3

Remove unneeded utils

2546cf3

Add version id

00a4bb2

Add ADR for fingerprinting

56a3c1b

Fixes for internal review issues

1896afb

egparedes added 3 commits June 10, 2026 17:39

Add unit tests for tree_cata and the fingerprinting fixes

a7d8805

Describe layered catamorphism design and optree alternative in ADR 0023

0364a69

Honor copyreg.dispatch_table reducers in the fingerprint fallback

7096f05

NumPy ufuncs (e.g. np.cbrt in DSL closure variables) only pickle through their copyreg.dispatch_table reducer; calling __reduce_ex__ directly raises. Consult the dispatch table first, like pickle.Pickler does.

egparedes mentioned this pull request Jun 10, 2026

refactor[next]: layer the structural fingerprinter as a catamorphism with digest algebras egparedes/gt4py#9

Closed

egparedes added 15 commits June 10, 2026 19:15

Rename decomposition vocabulary to DecompositionAtom/ObjectDecomposition

a59f41e

Propagate the rename through docstrings, the tree_cata doctest, tests and ADR 0023; cache _class_tag with functools.cache.

Rename reduce_object (formerly tree_cata)

b642812

Add AtomReducer and CompositeReducer type aliases

c09a2ce

Add CycleReducer type alias

79e3e51

Rename reducer parameters and implementations to match the type aliases

a8c8fbd

atom_reducer/composite_reducer/cycle_reducer (formerly leaf_alg/node_alg/ cycle_alg), with matching parameter and prose updates in tests and ADR.

Replace the _leaf() helper with the AtomicContent.from_typed_value() …

fe70f2f

…builder

Cleanups and refactorings

a31222f

Restore eve.utils.merge_dispatchers() and its tests

95d16ae

Rename catabolize() (formerly reduce_object) and use a _VisitAction enum

e888bfd

Also fix the import-order NameError reintroduced by moving the public deconstructor builders above their implementation dependencies.

Merge branch 'main' into refactor/fingerprinting-tree-cata

eede3a2

havogt reviewed Jun 11, 2026

View reviewed changes

egparedes added 2 commits June 11, 2026 20:30

egparedes requested review from edopao and havogt June 11, 2026 21:51

havogt reviewed Jun 12, 2026

View reviewed changes

edopao reviewed Jun 12, 2026

View reviewed changes

edopao mentioned this pull request Jun 15, 2026

fix[next-dace]: Add dace configuration to compilation fingerprint #2650

Open

		return target is obj


		def _reference_by_fully_qualified_name(obj: Any) -> str:

		return cast(xtyping.SingleDispatchCallable[P, T], result)


		def merge_dispatchers(

		_DECONSTRUCT_PICKLE_REDUCE_PROTOCOL: Final[int] = 2


		def _dataclass_fields(cls: type) -> Optional[tuple[Any, ...]]:

Conversation

egparedes commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Testing

Uh oh!

havogt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

egparedes commented Jun 11, 2026

Fixed

Left as-is (flagging for your call)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edopao Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

egparedes commented Jun 10, 2026 •

edited

Loading

edopao Jun 12, 2026 •

edited

Loading