Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/searchable-canonical-name-prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@ensnode/ensdb-sdk": patch
"ensindexer": patch
"ensapi": patch
---

Add a materialized `domains.__canonical_name_prefix` column — the first 64 code points of `canonical_name` — to back left-anchored / substring search and NAME ordering. Direct-SQL consumers can now `WHERE __canonical_name_prefix LIKE 'vit%' ORDER BY __canonical_name_prefix` instead of replicating the previous `left(canonical_name, 256)` expression index. `canonical_name` is unchanged and remains the column for exact (`=` / `IN`) matches and display; the Omnigraph `name.starts_with` filter now targets the prefix column while continuing to return `canonical_name`.
Original file line number Diff line number Diff line change
Expand Up @@ -5,44 +5,6 @@ import type { DomainCursor } from "@/omnigraph-api/lib/find-domains/domain-curso
import type { DomainsOrderBy } from "@/omnigraph-api/schema/domain-inputs";
import type { OrderDirection } from "@/omnigraph-api/schema/order-direction";

/**
* Length cap (in characters) of the `canonical_name` prefix used by:
* 1. the `(registry_id, left(canonical_name, N), id)` composite btree on `domains`,
* 2. all NAME-ordered queries' ORDER BY expressions, and
* 3. the value stored in `DomainCursor.value` when ordering by NAME — pre-truncated at
* encode time via {@link truncateNameForCursor} so filter-time comparisons are simple
* tuple compares against the index expression with no per-row `left(...)` re-application.
*
* The btree per-tuple max is ~2712 bytes; with `registry_id` and `id` consuming ~240 bytes of
* that, ~2400 bytes remain for the prefix expression. 256 chars × max 4-byte UTF-8 codepoint =
* 1024 bytes, well under the limit and within the realm of reasonable name lengths (mainnet avg
* is ~126). Queries MUST sort by this same expression for the planner to use the index for
* ordered scan; raw `canonical_name` ORDER BY falls back to a full scan + sort.
*
* An alternative solution is to redefine InterpretedLabel to enforce a maximum byte length of 255 before
* being truncated into an Encoded LabelHash — this mirrors a name's resolvability (must be dns-encodable)
* and allows us to avoid storing spam names. Then we'd also have to produce a b-tree-indexed
* materializedCanonicalName field that's length-capped as well to fit the btree index. Then we could
* query against that column instead of the full InterpretedName. All of that would avoid this
* LEFT(...) expression index and the necessity for the query pattern to match the defined index
* (to avoid the full scan).
*/
export const CANONICAL_NAME_SORT_PREFIX = 256;

/**
* Truncate a `canonicalName` to the cursor / index prefix length. Used when writing the cursor
* value for NAME orderings — callers slice once at encode time so the encoded cursor stays small
* (long names can hit thousands of characters) and `cursorFilter` can compare directly against
* the index expression without re-applying `left(...)` per row.
*
* Uses code-point iteration (`[...name]`) rather than `String.slice`, which counts UTF-16 code
* units and would split surrogate pairs. Postgres `left(text, N)` counts characters (code
* points), so this keeps the JS-side and DB-side prefixes byte-identical.
*/
export function truncateNameForCursor(name: string | null): string | null {
return name === null ? null : [...name].slice(0, CANONICAL_NAME_SORT_PREFIX).join("");
}

/**
* The order column / expression for each `DomainsOrderBy` value.
*
Expand All @@ -54,7 +16,7 @@ function getOrderColumn(orderBy: typeof DomainsOrderBy.$inferType): SQL {
const { ensIndexerSchema } = di.context;
switch (orderBy) {
case "NAME":
return sql`left(${ensIndexerSchema.domain.canonicalName}, ${sql.raw(String(CANONICAL_NAME_SORT_PREFIX))})`;
return sql`${ensIndexerSchema.domain.__canonicalNamePrefix}`;
case "DEPTH":
return sql`${ensIndexerSchema.domain.canonicalDepth}`;
case "REGISTRATION_TIMESTAMP":
Expand Down Expand Up @@ -117,8 +79,6 @@ export function cursorFilter(
const value = (() => {
switch (cursor.by) {
case "NAME":
// Already pre-truncated at encode time (see `truncateNameForCursor`), so this matches
// the index expression `left(canonical_name, CANONICAL_NAME_SORT_PREFIX)` directly.
return sql`${cursor.value}::text`;
case "DEPTH":
return sql`${cursor.value}::int`;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import { DomainCursors } from "@/omnigraph-api/lib/find-domains/domain-cursor";
import {
cursorFilter,
orderFindDomains,
truncateNameForCursor,
} from "@/omnigraph-api/lib/find-domains/find-domains-resolver-helpers";
import type { DomainOrderValue } from "@/omnigraph-api/lib/find-domains/types";
import { lazyConnection } from "@/omnigraph-api/lib/lazy-connection";
Expand Down Expand Up @@ -65,7 +64,7 @@ const VERSION_TO_DOMAIN_TYPE: Record<
function nameCondition(filter: typeof DomainsNameFilter.$inferInput): SQL {
const { ensIndexerSchema } = di.context;
if (filter.starts_with) {
return ilike(ensIndexerSchema.domain.canonicalName, `${filter.starts_with}%`);
Comment thread
vercel[bot] marked this conversation as resolved.
return ilike(ensIndexerSchema.domain.__canonicalNamePrefix, `${filter.starts_with}%`);
Comment thread
shrugs marked this conversation as resolved.
}
Comment thread
shrugs marked this conversation as resolved.

if (filter.eq) {
Expand Down Expand Up @@ -255,7 +254,7 @@ export function resolveFindDomains(
const __orderValue: DomainOrderValue = (() => {
switch (orderBy) {
case "NAME":
return truncateNameForCursor(domain.canonicalName);
return domain.__canonicalNamePrefix;
case "DEPTH":
return domain.canonicalDepth;
case "REGISTRATION_TIMESTAMP":
Expand Down
2 changes: 1 addition & 1 deletion apps/ensapi/src/omnigraph-api/schema/domain-inputs.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ export const DomainsNameFilter = builder.inputType("DomainsNameFilter", {
fields: (t) => ({
starts_with: t.string({
description:
"Prefix-match on Interpreted Name for typeahead. ex: 'vit', 'vitalik.et'. Case-insensitive (InterpretedName labels are normalized).",
"Prefix-match on Interpreted Name for typeahead. ex: 'vit', 'vitalik.et'. Case-insensitive (InterpretedName labels are normalized). Matched against the first 64 code points of the name; prefixes longer than 64 code points never match.",
validate: { minLength: 1 },
}),
eq: t.field({
Expand Down
34 changes: 25 additions & 9 deletions apps/ensindexer/src/lib/ensv2/canonicality-db-helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ import type {
RegistryId,
} from "enssdk";

import {
CANONICAL_NAME_PREFIX_LENGTH,
truncateCanonicalNamePrefix,
} from "@ensnode/ensdb-sdk/ensindexer-abstract";
import { isRootRegistryId } from "@ensnode/ensnode-sdk";
import { isBridgedResolver, isBridgedTargetRegistry } from "@ensnode/ensnode-sdk/internal";

Expand Down Expand Up @@ -151,6 +155,7 @@ export async function ensureDomainInRegistry(
await context.ensDb.update(ensIndexerSchema.domain, { id: domainId }).set({
canonical: true,
canonicalName,
__canonicalNamePrefix: truncateCanonicalNamePrefix(canonicalName),
canonicalLabelHashPath,
canonicalPath,
canonicalDepth: canonicalLabelHashPath.length,
Expand Down Expand Up @@ -359,8 +364,9 @@ async function reconcileRegistryCanonicality(

/**
* Propagate a Label heal to every canonical Domain whose `canonicalLabelHashPath` contains
* `labelHash`. Re-renders `canonical_name` by joining each path element to its current
* `label.interpreted` value. `canonicalLabelHashPath` is head-first (root → leaf), but
* `labelHash`. Re-renders `canonical_name` (and its materialized `__canonical_name_prefix`) by
* joining each path element to its current `label.interpreted` value, computing the name once in a
* CTE so the `string_agg` isn't run twice. `canonicalLabelHashPath` is head-first (root → leaf), but
* `canonicalName` is the standard leaf-first ENS string (e.g. "vitalik.eth"), so the
* WITH ORDINALITY rows are joined in DESC ordinal order.
*
Expand All @@ -376,14 +382,23 @@ export async function cascadeLabelHeal(
labelHash: LabelHash,
): Promise<void> {
await context.ensDb.sql.execute(sql`
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical_name = (
SELECT string_agg(l.interpreted, '.' ORDER BY p.ord DESC)
FROM unnest(d.canonical_label_hash_path) WITH ORDINALITY AS p(lh, ord)
JOIN ${ensIndexerSchema.label} l ON l.label_hash = p.lh
)
WITH healed_names AS (
SELECT
d.id,
(
SELECT string_agg(l.interpreted, '.' ORDER BY p.ord DESC)
FROM unnest(d.canonical_label_hash_path) WITH ORDINALITY AS p(lh, ord)
JOIN ${ensIndexerSchema.label} l ON l.label_hash = p.lh
) AS name
FROM ${ensIndexerSchema.domain} d
WHERE d.canonical = true
AND d.canonical_label_hash_path @> ARRAY[${labelHash}]::text[];
AND d.canonical_label_hash_path @> ARRAY[${labelHash}]::text[]
)
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical_name = healed_names.name,
__canonical_name_prefix = left(healed_names.name, ${CANONICAL_NAME_PREFIX_LENGTH})
FROM healed_names
WHERE d.id = healed_names.id;
`);
}

Expand Down Expand Up @@ -494,6 +509,7 @@ async function cascadeCanonicality(
UPDATE ${ensIndexerSchema.domain} AS d
SET canonical = ${nextCanonical},
canonical_name = CASE WHEN ${nextCanonical} THEN dt.new_name ELSE NULL END,
__canonical_name_prefix = CASE WHEN ${nextCanonical} THEN left(dt.new_name, ${CANONICAL_NAME_PREFIX_LENGTH}) ELSE NULL END,
canonical_label_hash_path = CASE WHEN ${nextCanonical} THEN dt.new_path ELSE NULL END,
canonical_path = CASE WHEN ${nextCanonical} THEN dt.new_path_ids ELSE NULL END,
canonical_depth = CASE WHEN ${nextCanonical} THEN array_length(dt.new_path, 1) ELSE NULL END,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,20 @@ Performing SQL queries on the ENS Unigraph requires that you have the `unigraph`

Fetch a Domain by its canonical name. Because `canonical_name` is materialized across both ENSv1 and ENSv2, the same lookup works regardless of protocol version. See [Connect](/docs/integrate/unigraph/examples) for setup.

:::tip[Searching vs. displaying]
A `canonical_name` can be very long, but it's the full, correct name — always **select and display `canonical_name`**. When you need to **search** by prefix (`ILIKE 'vit%'`, case-insensitive to match the Omnigraph `starts_with` filter), match against the materialized `__canonical_name_prefix` column (the first 64 code points of `canonical_name`, backed by a GIN trigram index) so the `ILIKE` filter is index-backed:

```sql
SELECT id, type, canonical_name, canonical_node, owner_id
FROM ensindexer_0.domains
WHERE __canonical_name_prefix ILIKE 'vit%'
ORDER BY __canonical_name_prefix
LIMIT 10;
```

The `SELECT` still returns `canonical_name`; only the `ILIKE` / `ORDER BY` use the prefix. The GIN trigram index backs the `ILIKE` filter; the `ORDER BY` then sorts the matched set (cheap under a small `LIMIT`) — scope the query by `registry_id` to use the `(registry_id, __canonical_name_prefix, id)` btree for fully index-backed ordering. For exact matches, use `canonical_name` directly (`canonical_name = 'vitalik.eth'`).
:::

:::note[Canonical fields]
Canonical fields are populated on every Domain reachable from the canonical root, across both ENSv1 and ENSv2 — query them uniformly without branching by `type`. In SQL, these columns are `canonical_name`, `canonical_path`, `canonical_node`, and `canonical_depth`; in `ensdb-sdk`, the corresponding fields are `canonicalName`, `canonicalPath`, `canonicalNode`, and `canonicalDepth`.
:::
Expand Down
Loading
Loading