Skip to content

kylejtobin/sit

Repository files navigation

Semantic Index Types

Naming is programming.

A type's field names are not labels the model ignores. They are instructions the model executes.

# Same type. Different computation.

churn_risk_tier: RiskTier   # assess voluntary customer departure
x7: RiskTier                # pick a union variant

Both declarations are structurally identical. A compiler cannot tell them apart, because it erases names before it runs. A language model tells them apart immediately, because the name is the first thing it reads. A semantic index type is a type declaration whose natural-language tokens, the field names, the descriptions, the variant names, act as computational indices for a neural consumer. They do not mark slots. They tell the model what to compute. The gap between the reader that erases names and the reader that executes them is where the subject starts.

Why this is the precise claim, not hype

The obvious pushback is that names were never free to change. Rename a field and your serializer breaks, your ORM stops mapping, your dispatcher misroutes. That is true, and it is exactly the distinction that makes this new.

Those consumers read a name as an opaque key. A key-reader is invariant under consistent renaming: change the name in the type and in every consumer at once, and behavior is preserved, because nothing depended on what the name meant, only on its matching itself elsewhere. That is the form of alpha-equivalence the runtime has always honored. λx.x and λy.y are the same function; rename throughout and nothing moves.

Every prior runtime reader that responds to a name conditions on its identity: whether it matches a counterpart elsewhere (a serializer key, an ORM column), or fits a fixed lexical rule (Fortran's I-N implicit typing). Identity-conditioning is what makes consistent renaming safe. Rename the counterpart too, or preserve the lexical class, and behavior is preserved, because nothing read what the name meant.

A neural consumer is the first reader that conditions on the name's meaning rather than its identity. There is no counterpart to coordinate with and no lexical rule to satisfy; the meaning is inferred from the token alone. So no consistent rename rescues you. churn_risk_tier to x7, applied everywhere at once, still degrades the output, because the meaning was load-bearing and the meaning is gone.

Prior runtime readers condition on a name's identity and stay invariant under consistent renaming. The neural consumer is the first that conditions on a name's meaning: renames that change meaning change its behavior, while renames that preserve meaning do not. That partition is the violation of alpha-equivalence, and it is the reason a schema is now part of the computation, not just a description of its output.

That is the thesis at the level a type theorist will accept and a builder will feel. Everything below is its consequences.

It is already in your system

If you use Pydantic, Zod, JSON Schema tool definitions, function calling, grammar-constrained decoding, or typed agent tools, you are already authoring semantic index types, named or not. Every field name you choose is steering a model right now. The only question is whether you are doing it on purpose.

So the mental model most teams carry, schema as output format, is no longer sufficient. The schema is part of the inference surface: read going in, not only checked coming out.

What you assumed What is actually true
The schema defines output format The schema participates in inference
Consistent renaming is safe Consistent renaming can change behavior
Descriptions are documentation Descriptions are executable guidance
Validation catches bad outputs Validation bounds semantic failure

Choosing churn_risk_tier over attrition_risk_tier selects between two analytical framings, voluntary departure versus passive loss, and the model computes the one you named. Schema authorship is computational authorship.

One declaration, two interpreters

A semantic index type has split operational semantics: one declaration, read by two consumers that disagree about what a name is.

flowchart LR
    A["Type Schema<br/>names, descriptions, variant names, constraints"] --> B["Formal Interpreter<br/>validator / type system"]
    A --> C["Neural Interpreter<br/>language model"]
    B --> D["Structural Channel<br/>which outputs are valid"]
    C --> E["Semantic Channel<br/>which valid outputs become likely"]
    D --> F["Structured Output"]
    E --> F
Loading

The formal interpreter erases names and reads structure: arity, types, constraints, construction invariants. It decides what outputs are valid, and governs admissibility.

The neural interpreter reads names and descriptions as task framing: which domain this is, which distinction matters, what kind of computation to perform. It biases which valid output is likely, and governs salience.

Structure defines what can be said. Semantics defines what gets said. Both run on the same text.

How far the names can move the output

Start with what is exactly true. Not a bound — an identity. For a field $f$ and a given input $x$ (one customer profile, one document), the output's uncertainty splits three ways:

$$\underbrace{H(Y_f \mid x)}_{\text{room the document leaves}} ;=; \underbrace{I(N; Y_f \mid x)}_{\text{room the name takes}} ;+; \underbrace{H(Y_f \mid x, N)}_{\text{room left to noise}}$$

where $N$ is the naming variant and $Y_f$ is the model's output for field $f$. Nobody can call this loose: it is the definition of conditional mutual information, rearranged. Read it left to right:

  • $H(Y_f \mid x)$ — given this profile, before the field's name says anything, how undetermined the answer is. A profile that screams one tier — churned, two-month tenure, four support calls — pins it: near 0. A genuinely borderline profile, spread across three plausible tiers: open, around 1.5 bits. This is the room available, and the document sets it.
  • $I(N; Y_f \mid x)$ — how much of that room the name takes: churn_risk_tier versus retention_offer_aggressiveness collapsing "which of these values" down to one. This is the semantic channel, measured against this document — the only place it is real.
  • $H(Y_f \mid x, N)$ — what remains once document and name have both spoken. Model noise: the part a better model removes, not a better schema.

The whole claim is the first term. A name can only instruct where the document was silent. A name's power is not a property of the name; it is exactly the uncertainty the document leaves behind. A precise field name is worth nothing on a profile that pins its own answer, and worth its full weight on a borderline one — and the identity says why: $I(N; Y_f \mid x) \le H(Y_f \mid x)$, and the room $H(Y_f \mid x)$ is set by the document, not by your type. Naming is most load-bearing exactly where the input is most ambiguous. That is the mechanism, not a hedge — and it is the opposite of the "wide field, unbounded influence" story, which mistook the type for the variable that mattered.

Keep a ceiling, too; security claims need one. It is the same chain with the worst case made explicit:

$$I(N; Y_f \mid x) ;\le; H(Y_f \mid x) ;\le; \log_2 \lvert V_f\rvert \qquad (\text{output confined to } V_f)$$

Type constraint $\log_2\lvert V_f\rvert$ What it bounds
bool 1 bit worst-case room, any document
4-variant union 2 bits worst-case room, any document
unconstrained str unbounded type leaves the room uncapped

Three authors set the three terms. The type sets the worst-case room $\log_2\lvert V_f\rvert$ — the most this field could ever be undetermined. The document sets the actual room $H(Y_f \mid x)$ — how undetermined it is here, usually far less. The name takes at most the actual room. The old instinct that influence is a fact about enum size was reading the wrong author: the type owns only the worst case. A wide field over a decisive document is safe; a narrow field over a genuinely ambiguous one is where the name quietly drives the whole choice. Security keeps the ceiling — a narrow $\lvert V_f\rvert$ caps a poisoned name's blast radius on every document, since $H(Y_f \mid x) \le \log_2\lvert V_f\rvert$ holds for all $x$. Builders get the dial — $H(Y_f \mid x)$ is what you actually spend, per call.

One caveat, before a careful reader supplies it: this sizes one field. $I(N; Y_f \mid x)$ says nothing about a name in one field reframing how the model reads another; that cross-field influence lives in a joint quantity that is generally larger. The identity is the right tool for sizing a single channel and the wrong one for certifying a system.

The discipline: progressive hardening

The dial gives you a development loop for mixed formal and neural systems.

flowchart LR
    A["Steer with language<br/>precise names, clear descriptions, good variant names"] --> B["Observe where the model fails"]
    B --> C["Harden the failure into structure<br/>narrow the type, add validators, split wide fields"]
    C --> D["Spend semantic bandwidth, buy a structural guarantee"]
Loading

Steer with language first, because it is cheap and often enough. Watch where the model fails. Harden each failure into structure, converting "the name asks for this" into "the type permits only this." Each step moves a failure from the semantic channel, where it is merely unlikely, to the structural channel, where it is impossible. That is the entire craft, and the bound tells you the exchange rate.

Instruct, constrain, certify

A prompt instructs. A semantic index type does three jobs at once.

Artifact Instructs Constrains Certifies
Prompt yes no no
Schema text alone sometimes weakly no
Semantic index type in typed construction yes yes yes

It instructs through names and descriptions, constrains through types and unions, and certifies through construction: in a typed construction system, a value that exists has already satisfied everything its type declares. This is "Parse, Don't Validate" at whole-program scope, where construction is the proof and the constructed value carries its guarantee forward. The prompt template, the validation pipeline, and the orchestration glue stop being three artifacts kept in sync. They collapse into one declaration doing three jobs, read by the two interpreters above: the formal one that checks and the neural one that understands. That is why this is a systems concept, not a prompting trick.

The same mechanism is an attack surface

If a name computes, a name is an instruction, and the moment instruction travels in a data channel you have the injection problem, the same vulnerability class as SQL injection at the schema level. A field name or description drawn from untrusted input is executable text reaching the model.

So the engineering story and the security story are one story, both about control of the semantic channel. The defenses are the familiar ones applied to schema text: provenance, sanitization, least privilege, and structural containment. The ceiling returns here as a security primitive: a narrow $\lvert V_f\rvert$ caps the blast radius of a poisoned name on every document, since $H(Y_f \mid x) \le \log_2\lvert V_f\rvert$ holds for all $x$.

What is established, what is predicted

Separate two claims that are easy to blur, and state each at its real strength.

The phenomenon is established. That neural consumers shift behavior under structure-preserving renaming is documented by converging evidence from three independent research communities: schema-guided dialogue, text-to-SQL, and code language models. This project does not discover that effect. It unifies three separate literatures under one abstraction, alpha-equivalence violation, which is a lower burden than discovery and a stronger position to argue from: the effect is not in question, only its framing was missing.

The in-domain measurement is predicted. Those communities studied dialogue, SQL, and code, not Pydantic structured output. So the specific magnitude, and the behavior under the decisive misleading condition, in this target domain, is predicted-not-yet-shown. A preliminary design for measuring it lives in experiment.md; it has not been run, and this document treats nothing in it as a result. Nor does the claim lean on it: the effect is directly observable by anyone who renames a field in their own schema and watches the output move — so the measurement is corroboration, not foundation.

The experiment isolates the prediction across four structurally isomorphic schema variants.

Variant Semantic content What it tests
Baseline precise names plus descriptions correct semantic indexing
Names-only names kept, descriptions removed identifiers versus prose
Vacuous field_1, OPTION_A, generic text semantic channel removed
Misleading coherent wrong-domain naming different computation, same structure

The misleading condition is the decisive one. Vacuous naming only removes guidance, so it can only make the output noisier. Misleading naming, coherent but wrong-domain names over identical structure, tests whether the model computes a different function when the names point elsewhere. That makes it a directional claim — the output should not merely scatter, it should move toward the value the wrong names point at — so it demands a directional measurement, not a symmetric one. If the output stays structurally valid while the mass moves to where the names point, the schema is not formatting. It is part of the task.

Read next

License

Source code is MIT. Written content is CC BY 4.0. See LICENSE.

About

Naming is programming. Renaming a field changes what the model computes. Alpha equivalence breaks when the compilation target reads natural language.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages