feat(unigraph): materialize searchable __canonical_name_prefix column#2255
feat(unigraph): materialize searchable __canonical_name_prefix column#2255shrugs wants to merge 7 commits into
Conversation
Add a length-capped (64 code point) materialized `domains.__canonical_name_prefix` column to back left-anchored/substring search and NAME ordering, replacing the `left(canonical_name, 256)` expression index. Direct-SQL consumers can now `WHERE __canonical_name_prefix LIKE 'vit%' ORDER BY __canonical_name_prefix` without replicating the expression. - ensdb-sdk: column (explicit DB name to preserve the `__`), shared CANONICAL_NAME_PREFIX_LENGTH + truncateCanonicalNamePrefix, repointed composite btree + GIN trigram onto the prefix column; hash on canonical_name kept for eq/in. - ensindexer: maintain the column in all three canonicality write paths (ensureDomainInRegistry, cascadeCanonicality, cascadeLabelHeal). - ensapi: Omnigraph name.starts_with + NAME order/cursor target the prefix column; canonical_name still returned everywhere and used for exact matches. - docs: schema-reference column/index + prefix-search example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
🦋 Changeset detectedLatest commit: a52694f The changes in this PR will be included in the next version bump. This PR includes changesets to release 24 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Review limit reached
More reviews will be available in 33 minutes and 12 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR adds a materialized ChangesSearchable canonical name prefix materialization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/ensapi/src/omnigraph-api/lib/find-domains/find-domains-resolver.ts`:
- Around line 68-70: The current prefix search only queries the materialized
__canonicalNamePrefix which is truncated to 64 code points and thus drops
matches when filter.starts_with is longer; change the logic in the
find-domains-resolver (where return
ilike(ensIndexerSchema.domain.__canonicalNamePrefix, `${filter.starts_with}%`)
is used) to handle long prefixes: if filter.starts_with length <= 64 keep the
existing ilike against __canonicalNamePrefix, otherwise run the starts-with
ilike against the full canonicalName (ensIndexerSchema.domain.canonicalName)
instead (or use a combined condition: prefix-ilike OR canonicalName-ilike) so
long inputs still match. Ensure you reference filter.starts_with,
__canonicalNamePrefix and canonicalName when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f2c6dd11-bfc6-46a0-89c0-0cd15c984e92
📒 Files selected for processing (7)
.changeset/searchable-canonical-name-prefix.mdapps/ensapi/src/omnigraph-api/lib/find-domains/find-domains-resolver-helpers.tsapps/ensapi/src/omnigraph-api/lib/find-domains/find-domains-resolver.tsapps/ensindexer/src/lib/ensv2/canonicality-db-helpers.tsdocs/ensnode.io/src/content/docs/docs/integrate/unigraph/examples/domain-by-name.mdxdocs/ensnode.io/src/content/docs/docs/integrate/unigraph/schema-reference.mdxpackages/ensdb-sdk/src/ensindexer-abstract/unigraph.schema.ts
- drop truncateNameForCursor wrapper; use truncateCanonicalNamePrefix directly - remove now-redundant inline comments (starts_with, cursor NAME) - rename cascadeLabelHeal CTE healed -> canonical_name - early return in truncateCanonicalNamePrefix - docs: reword search-vs-display tip; convert all Indexes sections to lists; document current state only (drop previous-approach framing) - changeset: drop "Reindex required" Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@greptile review |
Greptile SummaryMaterializes a new
Confidence Score: 5/5Safe to merge — all three write paths are consistent, the explicit column name correctly preserves the leading underscores, and exact-match correctness is unaffected. Every place that sets canonical_name also sets __canonical_name_prefix (verified across all three helpers). The explicit t.text("__canonical_name_prefix") bypasses ponder's casing correctly. eq/in still target the full canonical_name column, so exact-match behaviour is unchanged. The cascadeLabelHeal CTE restructure is semantically equivalent to the previous correlated subquery. The 64-char cap is documented at the GraphQL layer. No correctness regressions were found. No files require special attention. Important Files Changed
Reviews (12): Last reviewed commit: "docs(unigraph): use ILIKE in prefix-sear..." | Re-trigger Greptile |
Greptile SummaryThis PR materializes a new
Confidence Score: 4/5Safe to merge; requires a full reindex, but the write-path changes are consistent and the public GraphQL schema is unchanged. All three write paths correctly maintain apps/ensindexer/src/lib/ensv2/canonicality-db-helpers.ts — specifically the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Domain write event] --> B{Write path}
B -->|ensureDomainInRegistry JS-side Drizzle update| C["truncateCanonicalNamePrefix(canonicalName) → __canonicalNamePrefix"]
B -->|cascadeCanonicality SQL CTE| D["left(dt.new_name, 64) → __canonical_name_prefix NULL when canonical=false"]
B -->|cascadeLabelHeal SQL CTE restructured| E["WITH canonical_name AS (...) left(canonical_name.name, 64) → __canonical_name_prefix"]
C & D & E --> F[(domains table canonical_name full text __canonical_name_prefix first 64 cp)]
F --> G{Query type}
G -->|starts_with| H["ilike(__canonical_name_prefix, 'vit%') GIN trigram index"]
G -->|eq / in| I["= / IN on canonical_name hash index"]
G -->|NAME ORDER BY| J["ORDER BY __canonical_name_prefix btree index registry_id, prefix, id"]
Reviews (2): Last reviewed commit: "fix(unigraph): address review feedback (..." | Re-trigger Greptile |
- NAME cursor reads stored domain.__canonicalNamePrefix directly (single source of truth) instead of recomputing from canonicalName; drop the now-unused truncateCanonicalNamePrefix import in the resolver - document the 64-code-point starts_with cap in the GraphQL field description; regenerate Omnigraph SDL Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@greptile review |
the GIN trigram backs the LIKE filter; ORDER BY __canonical_name_prefix sorts the matched set unless scoped by registry_id (composite btree). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@greptile review |
…loop 5) avoids the CTE name shadowing the target canonical_name column in the UPDATE...SET, which several reviewers found ambiguous. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@greptile review |
align direct-SQL prefix-search docs with the case-insensitive Omnigraph starts_with semantics; the GIN trigram index serves ILIKE equally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@greptile review |
Reviewer Focus (Read This First)
three things:
canonicality-db-helpers.ts— does every place that writescanonical_namenow also write__canonical_name_prefixconsistently (insert, cascade flip, label heal)? thecascadeLabelHealrestructure into a CTE is the only non-mechanical bit.t.text("__canonical_name_prefix")— ponder's default casing strips leading underscores (__hasChildren→has_children), so the explicit name is load-bearing for the raw-SQLSET __canonical_name_prefix = ...clauses to resolve.eq/inmatches still hit the fullcanonical_name, not the prefix.Problem & Motivation
canonical_namecan be arbitrarily long (spam names hit thousands of chars), so it can't go in a plain btree — the per-tuple size limit blows up.Domain.subdomainsrely on a composite expression index(registry_id, left(canonical_name, 256), id), and every query has to replicate that exactleft(canonical_name, 256)expression inORDER BY/keyset to hit the index. invisible and error-prone for anyone writing SQL directly against ENSDb.What Changed (Concrete)
domains.__canonical_name_prefix(Name, explicit DB name to keep the__) — first 64 code points ofcanonical_name, NULL iffcanonical = false.CANONICAL_NAME_PREFIX_LENGTH = 64+truncateCanonicalNamePrefix()inensdb-sdk(code-point slice, byte-identical to postgresleft(text, N)), consumed by both indexer and api.(registry_id, __canonical_name_prefix, id)btree for ordering, GIN trigram forLIKE 'vit%'+ substring.canonical_namehash index kept foreq/in.canonicality-db-helpers.tsmaintains the column in all three write paths:ensureDomainInRegistry,cascadeCanonicality(CTE),cascadeLabelHeal(restructured to compute the healed name once in a CTE, then set both columns). no triggers — ponder doesn't support them on managed tables.name.starts_withand NAME order/cursor now target__canonical_name_prefix;eq/instay oncanonical_name. GraphQL still returnscanonical_nameeverywhere — the prefix column is never exposed.Design & Planning
planned upfront; two decisions confirmed before implementing: 64-char cap (vs 128/256), and keep
%like%substring support (so GIN trigram stays, on the prefix column, and the resolver keepsilikerather than switching to btree + lowercasedLIKE).alternative considered: btree
text_pattern_ops+ case-sensitiveLIKE(lowercase input). rejected once we decided to keep substring search — GIN already serves prefix case-insensitively, soilikepreserves exact current behavior with no input-normalization hack.net effect is the same index shapes as before (composite btree for ordering + GIN trigram for fuzzy), just pointed at a real length-capped column instead of
left(canonical_name, 256)/ the full name.Planning artifacts: internal plan (worktree)
Reviewed / approved by: n/a
Self-Review
__in default casing (__hasChildren→has_children), which would have silently broken the raw-SQLSETclauses — fixed by passing the explicit column name.cascadeLabelHealcomputes thestring_aggname once in a CTE instead of running the correlated subquery twice (once per column).__prefix marks it internal, mirroringRegistry.__hasChildren; docs explicitly steercanonical_namefor exact/display vs__canonical_name_prefixfor search.CANONICAL_NAME_SORT_PREFIXconstant and its long doc block in the resolver helpers;truncateNameForCursornow delegates to the shared helper.Cross-Codebase Alignment
canonical_name,canonicalName,CANONICAL_NAME_SORT_PREFIX,truncateNameForCursor,byCanonicalNameFuzzy,left(,__hasChildren,has_children,starts_with.interpreted(separate concern);ENSDB_SCHEMA_CHECKSUM(computed at runtime viagetDrizzleSchemaChecksum, so it auto-reflects the new column — no stale value); the GraphQLDomainobject (explicit Pothos fields, so the new column isn't auto-exposed).Downstream & Consumer Impact
starts_with/eq/inidentical), so nopnpm generate. requires a reindex (schema change).unigraph/schema-reference.mdx(column + indexes),unigraph/examples/domain-by-name.mdx(prefix-search snippet).__is intentionally part of the SQL column name to signal "internal implementation detail — don't reach for this for exact match/display."Testing Evidence
pnpm typecheck(ensdb-sdk, ensindexer, ensapi),pnpm lint, unit tests--project ensapi --project ensindexer(511 passed) and--project @ensnode/ensdb-sdk(58 passed, incl. schema checksum). full integration CIpnpm test:integration:ci(docker devnet → index → ENSApi → integration suite): 16 files, 350 passed / 2 todo.__canonical_name_prefix == left(canonical_name, 64)on the write path; covered indirectly by the integration find-domains/pagination suite.Scope Reductions
Risk Analysis
ilikesemantics, same ordering collation); reindex rebuilds the column from scratch. revert is a clean single-commit rollback.Pre-Review Checklist (Blocking)