Skip to content

OVOS-TRANSFORM-1: Transformer Plugins Specification#20

Draft
JarbasAl wants to merge 36 commits into
devfrom
spec/transformer
Draft

OVOS-TRANSFORM-1: Transformer Plugins Specification#20
JarbasAl wants to merge 36 commits into
devfrom
spec/transformer

Conversation

@JarbasAl
Copy link
Copy Markdown
Member

@JarbasAl JarbasAl commented May 24, 2026

Companion issue: #19

Summary

Defines a single normative contract for all six transformer plugin types that run at fixed injection points around the utterance lifecycle.

What the spec covers

  • §2 — Six transformer types: audio, utterance, metadata, intent, dialog, TTS
  • §3 — Six lifecycle injection points:
    • §3.1 Audio (pre-STT): acoustic processing
    • §3.2 Utterance (post-STT): text normalisation, STT correction
    • §3.3 Metadata (post-utterance): entity/signal extraction into session
    • §3.4 Intent (post-match, pre-dispatch): capture enrichment, slot normalisation
    • §3.5 Dialog (post-skill, pre-TTS): persona/tone rewriting, translation — runs in audio-output layer
    • §3.6 TTS (post-synthesis, pre-playback): audio-domain effects
  • §4 — Chain ordering: ascending priority (lower = earlier); deployer-configured explicit order wins
  • §5 — Per-session chain overrides via session.{audio,utterance,metadata,intent,dialog,tts}_transformers fields
  • §6 — Passive registration index per type: ovos.transformer.{type}.list / .list.response
  • §7 — Error handling: exceptions and shape violations become no-ops; orchestrator logs and proceeds

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4bb13549-2200-4e86-ac58-9cf8c917795c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec/transformer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

JarbasAl added a commit that referenced this pull request May 24, 2026
Per the new "dedicated APPENDIX PR" policy, consolidating the
prior-art and design-deviation notes from the OVOS-CONTEXT-1
(PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR.
Those spec PRs are now scoped to their own spec files only;
the discussion / cross-spec touchups / in-tree prior art
all live here.

Adds to §4 Design rationale:
- "Intent context (CONTEXT-1)" — the Adapt-only origins, the
  two-scope (private/shared) formalization, jurebes /
  nebulento / palavreado as prior art for excludes_context,
  the engine-side §5.3 mutation pathway resolving the
  PIPELINE-1 §4.2 contradiction.
- "Transformer plugins (TRANSFORM-1)" — the architectural-
  pattern framing, intent transformers as the system-typing
  home, the nine concrete in-tree plugins as prior art, the
  ascending-vs-descending priority deviation called out,
  cancellation alignment with existing plugin convention,
  and the language disambiguation hierarchy mirroring current
  ovos-core code paths.

Removes from §7 Known gaps:
- "Intent context" bullet (formalized in CONTEXT-1).
- "The utterance-transformer chain" bullet (formalized in
  TRANSFORM-1).
JarbasAl and others added 24 commits May 26, 2026 14:34
Adds a new specification covering the six transformer plugin types
OVOS already runs informally (audio, utterance, metadata, intent,
dialog, TTS) as a single unified spec with one shared abstraction
and six per-type subsections.

Highlights:
- Six lifecycle hooks defined precisely against OVOS-PIPELINE-1's
  per-utterance flow. Each hook runs an ordered chain of black-box
  transformers; every transformer in the chain runs (no
  claim-or-decline).
- Intent transformers (§3.4) called out as the spec'd home for
  system-type entity injection (dates, numbers, durations,
  ordinals, named entities) - the LLM-friendly place to apply
  OVOS-INTENT-1 §5.3's deferred slot typing globally without each
  skill rolling its own date parser.
- Per-type "Where LLMs fit" notes through §3.1-§3.6.
- Ascending priority ordering (lower = earlier; default 50),
  aligning with the fallback-skill convention already in OVOS;
  explicit deployer order wins when present. Current OVOS sorts
  transformers descending - APPENDIX flags this as a normative
  deviation for current plugins to flip.
- Per-session chain overrides via optional
  session.{audio,utterance,metadata,intent,dialog,tts}_transformers
  fields, parallel to OVOS-PIPELINE-1's session.pipeline.
  Restricted / remote-peer sessions can disable LLM-backed
  transformers they don't trust off-network.
- Passive registration index per type (transformer.*.list with
  .response), same posture as PIPELINE-1's intent index and
  CONTEXT-1's §5.4 context index.
- Robust error handling: transformer exceptions and shape-
  violations are treated as no-op transforms; orchestrator logs
  and proceeds. Single transformer's bug never aborts the
  utterance (mirrors PIPELINE-1 §6.2).
- Non-goals: typed-value schemas, per-plugin behavioural
  contracts, cross-transformer coordination, hot reload, timeout
  policy.

PR scope is just transformer.md + README index + CHANGELOG entry
+ APPENDIX prior-art bullet. Cross-spec touches (PIPELINE-1 flow
annotation, MSG-1 §4 session-fields note, INTENT-1 §5.3 forward
reference) deferred to follow-up PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every must-fix from the end-to-end review plus introduces
the cancellation semantics that current OVOS expresses informally
via ovos-utterance-plugin-cancel.

Must-fixes:

- §3.4 ordering with OVOS-CONTEXT-1 §5.3 disambiguated: engine-side
  session mutation runs FIRST, intent-transformer chain runs second.
  Lets an intent transformer read context the matching engine just
  wrote (the natural enrichment direction).

- §3.4 owner_id/intent_name MUST NOT change is now backed by an
  orchestrator enforcement clause: violations are treated as §7
  shape violations, the orchestrator discards the transformer's
  output and proceeds with the prior step's Match unchanged.

- §3.2 utterance transformers MAY now return an empty list (signals
  "no plausible transcription"); covers the existing
  transcription-validator pattern that the previous MUST contradicted.

- §3.4 / §3.3 capture/metadata deletion rules softened from MUST NOT
  to SHOULD NOT, with carve-outs for filtering / redaction / PII
  removal as the deployer-configured purpose.

- §1.1 / §2 — flow diagram explicitly framed as the canonical staged
  flow this specification assumes. Streaming and end-to-end
  implementations may omit hooks for stages they don't materialise,
  per published conformance scope.

- §3.6 TTS "MUST NOT silently change perceived language" softened to
  SHOULD NOT re-synthesize in a different language — vague
  unenforceable rule clarified by reference to the staging
  (translation is §3.5 dialog territory).

New §8 — Utterance cancellation:

Two surfaces sharing one termination contract:

- §8.1 in-band: a transformer signals cancellation by setting a
  reserved cancel:{by, reason} key in its returned context.
  Orchestrator inspects after every transformer; aborts remaining
  chain and all subsequent stages on sight. Synchronous, race-free.

- §8.2 out-of-band: ovos.utterance.cancel bus event, emittable by
  any participant (skills, devices, peers, debugging tools). The
  orchestrator handles best-effort against in-flight lifecycles;
  §8.4 documents the inherent async race and the consequence (late
  cancels MUST be logged but cannot unwind side effects, consistent
  with §7's no-rollback rule).

- §8.3 terminal events: cancellation terminates with
  ovos.utterance.cancelled (new) followed by ovos.utterance.handled
  (OVOS-PIPELINE-1 §9.5). Orchestrator MUST NOT emit
  complete_intent_failure on the cancellation path — failure and
  cancellation are distinct observables. The cancelled event carries
  the same by/reason pair the cancellation signal carried.

Other review items addressed:

- §6 — adds an aggregate transformer.list query alongside the six
  per-type queries, for debugging tools that don't want six
  round-trips. priorities field in responses is now always returned
  (no more "absent when explicit-order overrides priority"
  cleverness). loaded vs chain distinction clarified.

- §5 — explicit wire-weight note: per-session override fields cost
  zero for sessions that don't set them; deployers using them for
  high-traffic sessions SHOULD keep lists short.

- §7 — concurrency contract (transformer instances are process-wide
  and MUST be re-entrant; per-instance state MUST be guarded for
  concurrent access) and no-rollback note (side effects performed
  mid-chain are not unwound by later raises or §8 cancellations;
  transformers needing transaction semantics implement them
  internally).

Conformance (§9) and the new §8 rules thread through both
orchestrator and transformer MUST/MAY bullets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related reframes to put the spec into proper prescriptive
voice-OS-design territory:

A. Transformer chains as architectural pattern, not mandate
--------------------------------------------------------------

§1 / §2 / §9 reframed throughout: an orchestrator MAY implement
transformer chains at any subset of the six injection points
(including none). For each chain it implements, the corresponding
contract binds; for chains it does not implement, no obligation
arises. A null-implementation that runs no chains is conformant.
The spec defines the design pattern and the per-injection-point
contract, not a feature list.

§3 reframed: each of §3.1-§3.6 now opens with an explicit
**architectural rationale** explaining WHY this exact point in the
lifecycle is the right home for this kind of work — what artefact
exists here that doesn't exist elsewhere, what mutations are
possible here that aren't possible at any other point. The
rationale is the justification for the injection point existing in
the spec at all; the canonical use cases and LLM-fit notes follow
from the rationale. This makes the spec prescriptive (here's how a
voice OS should be structured and why) rather than descriptive
(here's what current OVOS does).

Per-rationale highlights:
- §3.1 audio: only point where unprocessed audio exists. STT
  destroys prosody / acoustic-language / speaker info.
- §3.2 utterance: only point with text but no semantic commitment.
  Mutations ripple uniformly through every downstream engine.
- §3.3 metadata: only point with the joint audio+text signal and
  no intent commitment. Derive once, every consumer sees same value.
- §3.4 intent: only point with BOTH resolved intent identity AND
  free-text captures. Engine-agnostic enrichment surface.
- §3.5 dialog: only point where response exists as final text and
  TTS has not committed to how it sounds.
- §3.6 TTS: only point with the synthesized waveform — audio-domain
  mutations have nowhere else to live.

B. Cancellation is exclusively a transformer plugin contract
--------------------------------------------------------------

§8 reframed: cancellation is signalled only by a transformer
setting the in-band `cancel` context flag. There is no
`ovos.utterance.cancel` bus event a third party can emit; the
orchestrator owns the cancellation machinery and exposes only the
plugin contract. Deployments that want out-of-band cancellation
(hardware buttons, peer signals, channel barge-in) ship a thin
transformer that watches for the trigger and sets the §8.1 flag.

This consolidates §8 from two surfaces to one, removes the §8.4
race section (no out-of-band path = no async race), and makes the
spec's surface area for cancellation a single plugin contract.

New §10 non-goal: explicit "out-of-band cancellation channels" are
out of scope by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tors

Reframes the metadata-transformer chain so its input and output is
explicitly the full Message.context (OVOS-MSG-1 §2.3) — not a
sub-context object — and so its permitted mutations are
unrestricted across that surface.

Key changes to §3.3:

- Input: full Message.context including session carrier (§4 of
  MSG-1), session.context (CONTEXT-1 §2), session.pipeline
  (PIPELINE-1 §5), the six per-session transformer overrides (§5
  of this spec), routing keys (§3 of MSG-1), and any other
  top-level keys other specs or earlier transformers have written.

- Permitted mutations: a metadata transformer MAY mutate
  Message.context however it sees fit. Replaces the previous
  "MAY add/update keys; SHOULD NOT remove keys it did not set;
  MUST NOT modify reserved keys" with a single permissive rule:
  the chain is the deployer's in-process Message.context
  manipulation surface; loading a transformer is the deployer's
  authorization.

- Coordination guidance (SHOULD-level): notes the consequences of
  mutating each companion-spec field — session.context bypasses
  CONTEXT-1 §5 stamping; session.pipeline reroutes this utterance;
  session.lang affects downstream localization; source/destination
  affects routing for derived messages.

- New canonical use cases reflecting the expanded purview:
  per-utterance language override (write to session.lang),
  per-utterance pipeline switch (replace session.pipeline for one
  turn), system context injection (write session.context entries
  bypassing CONTEXT-1's bus events when provenance is not needed).

- LLM-fit note expanded: LLMs as per-utterance pipeline routers,
  not only as metadata classifiers.

Conceptually: §3.3 is now the spec's "anything-goes Message.context
operator" — distinct from §3.4 intent transformers (Match operators)
and from CONTEXT-1 §5 bus events (skill-side, provenance-stamped).
Three different mutation surfaces for three different deployment
intents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stress test

Reading ovos-bidirectional-translation-plugin/__init__.py end-to-end
exposed three ambiguities in the spec wording that the plugin works
around informally. All three are spec-side fixes (the plugin's
behaviour is reasonable and should be conformant under a tighter
reading).

§3.2 / §3.5 / §3.6 — input clarified as full Message.context
-------------------------------------------------------------

The plugin mutates session.lang from inside both its utterance
transformer (§3.2) and its dialog transformer (§3.5) halves. The
prior §3.2/§3.5 wording said "MAY merge keys into the context
object" — ambiguous about whether session-internal mutations
qualify. §3.3's recent reframe explicitly empowered metadata
transformers for full Message.context mutation; the spec read as if
§3.2/§3.5/§3.6 had weaker rights.

Clarified across §3.2, §3.5, §3.6: the `context` argument is the
full `Message.context` (same surface §3.3 covers), and §3.3's
permissive mutation rules apply uniformly. §3.3's distinguishing
trait is that it has no primary artifact input — context is its
only working surface — not that it has special mutation rights
other transformer types lack.

§7 — mid-lifecycle session-mutation propagation made explicit
-------------------------------------------------------------

The plugin's bidirectional pattern depends on an unwritten invariant:
when a transformer mutates session.lang and re-serializes session
into context["session"], every downstream stage reads the mutated
value rather than a cached pre-mutation copy. The spec implied this
through OVOS-MSG-1 §5's forward/reply semantics but never stated it.
New §7 clause makes the propagation rule explicit and tells
downstream consumers they MUST read live session values from
in-flight Message.context.

§7 — cross-transformer coordination via namespaced context keys
---------------------------------------------------------------

The plugin uses bare top-level context keys (`was_translated`,
`output_lang`, `translate_dialogs`) to coordinate between its two
halves. Works here because both halves are the same plugin sharing
a convention; two unrelated plugins picking the same bare key would
collide. §10 non-goal already says cross-transformer coordination
protocols are out of scope — but plugins are doing it via
convention regardless. New §7 clause: transformers SHOULD namespace
ad-hoc coordination keys with their transformer_id as a prefix
(e.g. `ovos-utterance-translation-plugin.output_lang` instead of
bare `output_lang`). The spec defines no central registry;
namespacing is what makes that absence safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onvention

Stress-tested §8 against two in-tree utterance transformers that
already implement cancellation today:

- ovos-utterance-plugin-cancel (NevermindPlugin) — matches locale
  cancel words, returns [], {"canceled": True, "cancel_word": ...}
- ovos-transcription-validator-plugin (TranscriptionValidatorPlugin) —
  LLM-based STT validation, returns [], {"canceled": True,
  "cancel_word": "[MISTRANSCRIPTION]"} when LLM rejects the
  transcription

Both use the same de facto convention: empty list as utterance
output, with `canceled: true` and `cancel_word: <reason>` as
top-level context keys. The previous §8.1 draft proposed a
different nested shape (`cancel: {by, reason}`). Adopted the
existing convention into the spec; the principle is the same one
that drove §3.3's reframe — trust real prior plugin design rather
than imposing a new shape.

§8.1 changes:
- Cancellation signal is two flat keys: `canceled: bool` and
  `cancel_word: string`. Both MUST be present together when
  signalling; one without the other is a §7 shape violation.
- Orchestrator stamps `cancel_by: <transformer_id>` automatically
  on observing the signal (parallels OVOS-CONTEXT-1 §5.2's
  origin-stamping rule — transformers can't impersonate each
  other's cancellations).
- Explicit note that when `canceled: true` is observed alongside
  an empty utterance list, the flag is the signal — the empty
  list is convention, not the trigger.

§3.2 changes:
- Distinguishes empty list alone (no plausible transcription →
  complete_intent_failure) from empty list + §8.1 keys
  (cancellation → ovos.utterance.cancelled). Both NevermindPlugin
  and TranscriptionValidatorPlugin are the cancellation case;
  spec now spells out which terminal event each shape produces.

§8.2 and §9 conformance updated to use the new key names
(cancel_word + cancel_by instead of by/reason). The transformer
MUST bullet now says "set canceled: true and cancel_word in
returned context; orchestrator stamps cancel_by".

LLM stress-test note: TranscriptionValidatorPlugin's design (LLM
classifier called inside transform, reprompt-or-error fire-and-
forget bus emit) validates §9's "MAY access the bus for
side-effects, SHOULD NOT depend on synchronous bus responses".
Concrete real-world example of the spec's *Where LLMs fit* §3.2
note - direct prior art.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The §8.1 cancellation signal's "why" field is renamed cancel_word
-> cancel_reason. cancel_word was incidental plugin metadata in
ovos-utterance-plugin-cancel and ovos-transcription-validator-
plugin (the actual cue / sentinel they happened to surface); the
spec's normative field is the structured concept "why was this
cancelled", which reads more cleanly as cancel_reason.

Plugins MAY continue to set their own cancel_word (or any other
top-level metadata) alongside the §8.1 keys — that's plugin-
specific and outside the spec's contract. §8.1 calls this out
explicitly and points at §7's namespacing guidance for ad-hoc
keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s/dialog-normalizer audit

Reading three more in-tree transformer plugins surfaced three
small but real spec ambiguities. All three are spec-side fixes.

ovos-utterance-normalizer (UtteranceNormalizerPlugin):
  Triples the utterance list per input — for each candidate emits
  expand_contractions(u), original u, normalize(u). Deduplicates.
  Proves §3.2 explicitly supports list expansion (already implicit
  in "MAY rewrite, expand, or contract").

ovos-utterance-corrections-plugin (UtteranceCorrectionsPlugin):
  XDG-stored full-utterance / regex / word replacement tables.
  Mutates utterances list IN PLACE (utterances[idx] =
  compiled_pattern.sub(...)). Step-1 full replace runs against
  utterances[0] only; steps 2-3 iterate the whole list. Surfaces
  the "utterances[0] is primary" implicit convention and the
  in-place vs new-list mutation ambiguity.

ovos-dialog-normalizer-plugin (DialogNormalizerTransformer):
  Canonical example of §3.5's "why this injection point" rationale:
  "I'm Dr. Prof. 12345 €" → "I am Doctor Professor twelve thousand
  three hundred forty-five euros". This can't live in skills
  (which shouldn't know TTS handling of currency symbols) and
  can't live in TTS transformers (operating on audio bytes).

Spec changes:

§3.2 — utterances[0] formalized as primary candidate. Later
indices are alternatives downstream matchers MAY try. Codifies
the de facto convention seen across all five §3.2 plugins audited
so far.

§3.2 — in-place mutation MAY be performed, or a new list returned;
both are conformant. ovos-utterance-corrections-plugin (in-place)
and the others (new lists) currently exercise both shapes; spec
now permits both explicitly.

§7 — guidance on reading language from Message.context resolves
a real cross-plugin inconsistency:
  - context["lang"] only (cancel, validator, normalizer)
  - sess.lang only (dialog normalizer)
  - both with fallbacks (translator, validator)
session.lang is now spec'd as the canonical preferred-language
signal; top-level context["lang"] (when present) is the
per-utterance override populated from data.lang per OVOS-MSG-1
§4.2. Transformers MUST tolerate both shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olution

Per the new "1 file per PR / self-contained / timeless" policy:

- Reverts the APPENDIX.md / CHANGELOG.md / README.md edits this branch
  accumulated. Each of those belongs in its own dedicated PR (APPENDIX
  per the new "dedicated PR just for appendix" rule; CHANGELOG and
  README as their own scoped PRs).
- Spec body is fully self-contained: no cross-references to APPENDIX
  for normative meaning, all per-§3.x rationales / canonical use
  cases / "where LLMs fit" notes stay in the spec body where readers
  can find them without leaving the file.
- Adds §7.1 Language resolution and the disambiguation hierarchy:
  six-level precedence chain (stt_lang > request_lang > detected_lang
  > data.lang > existing session.lang > config default), valid_langs
  gating, deprecation of top-level Message.context["lang"], per-
  injection-point producer responsibilities. Resolved value is
  written to session.lang. Orchestrator MUST resolve before running
  the §3.2 utterance chain. Bumps §1.1 scope list and §9
  orchestrator conformance to reference §7.1.

PR net diff is now a single file: transformer.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframes the spec to explicitly accommodate the deployment shape
current OVOS already uses — orchestrator split along the audio
boundary into cooperating processes (audio-input service / utterance-
handling service / audio-output service). From the spec's
perspective these are all "the orchestrator"; the split is a
deployment / containerization choice the spec allows but does not
prescribe.

The substantive consequence: no single orchestrator process has a
global view of all loaded transformers. The audio-input service
doesn't know what dialog transformers the audio-output service
loaded, and vice versa. The introspection surface is designed
around this.

§1 — new framing paragraph: "The orchestrator is a logical role
and MAY be implemented as multiple cooperating processes." Names
the audio-input / utterance-handling / audio-output split current
OVOS uses as the canonical example. Notes the no-global-view
consequence. The transformer_id definition reflects the split:
each process holds its slice of the mapping; the union across
processes is the full loaded set.

§6 — rewritten as broadcast-query / scatter-response. No central
registration index; each orchestrator process answers
transformer.{type}.list and transformer.list with its own local
slice. Responses carry loaded + priorities only (no global chain
order — composition is the §4 priority + §5 override combined
across the union). Pull-only discovery; no register/deregister
handshake. Single-process deployments answer fully from one
reply; split deployments from several.

§9 — conformance: each orchestrator process that implements one
or more chains MUST meet the per-process introspection
obligations for the chains it implements. Composition of
per-process responses is the orchestrator's full view.

§10 — non-goal: cross-process invocation topic shape left open.
Spec defines introspection (§6) and IO contracts (§3); deployers
pick whatever cross-process request/response convention fits
their substrate.

§5 — per-session override resolution updated to operate "over the
set of transformers it can reach" (in-process or via another
orchestrator process).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rce of truth

Refines §6's discovery model to better match async-bus reality:

- Orchestrator processes MAY volunteer a one-shot load-time
  announcement (push) of their loaded set, on the same
  .response topic a pull query would land on. This is a
  convenience for consumers already listening (a monitoring
  service that came up before the orchestrator process).
- Consumers MUST NOT rely on having received any prior
  announcement. Load ordering between producers and consumers is
  not guaranteed — a consumer that starts after the announcement
  fired has missed it, and the bus is async with no catch-up
  channel for missed broadcasts.
- A consumer that needs accurate state MUST query.

§9 conformance updated:
- Per-process MAY clause for load-time announcements.
- New "consumers" MUST clause: query when accuracy matters; do
  not assume announcements reached them.

Replaces the previous strict "MUST NOT spontaneously broadcast"
rule, which over-rotated against a useful convenience pattern
(load-time announcements) that's harmless as long as consumers
don't depend on them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raming

Two refinements:

1. Drop the aggregate `transformer.list` query. Only per-plugin-type
   topics remain: transformer.audio.list, transformer.utterance.list,
   transformer.metadata.list, transformer.intent.list,
   transformer.dialog.list, transformer.tts.list. An aggregate would
   imply a single responder with a global view, which this spec
   doesn't assume exists. Consumers wanting multiple types issue
   multiple queries.

2. Generic voice-OS framing instead of "OVOS" references:
   - §1: "natural homes for this kind of work in a voice operating
     system's utterance lifecycle" (was "the OVOS architecture
     recognizes")
   - §1 split-orchestrator paragraph: presents the audio-input /
     utterance-handling / audio-output split as "a natural split
     along the audio boundary" rather than "in current OVOS". The
     pattern is the prescribed shape, not a description of one
     implementation.

§6 + §9 updated for the no-aggregate change: per-type subscriptions
only, no aggregate topic in conformance bullets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, lang in SESSION-1

§1: split-orchestrator framing now references PIPELINE-1 §2 instead
   of restating; loses several paragraphs of duplicated narration.
§3.3 metadata transformers: new orchestrator-stamped metadata_by
   audit trail (list of transformer_ids, in chain order, appended
   when context is actually modified). Parallel to cancel_by for the
   cancellation chain. Plugin MUST NOT write it itself.
§5 per-session overrides: drop field restatement; reference
   SESSION-1 §2.1 claim and §2.5 deployment-default fallback.
   Tighten partial-unknown rule (orchestrator MUST NOT fall back to
   deployment default merely because one identifier is unknown).
§7 error handling: MUST log → SHOULD log (logging is deployment
   concern; catch-and-proceed is the load-bearing contract).
§7.1 lang resolution: full disambiguation hierarchy removed —
   moved to SESSION-1 §3.2 (where the lang signals are now claimed
   as session fields). Replaced with one short paragraph naming
   which transformer types are natural producers of which signals.
   Consolidation policy explicitly deferred to SESSION-1 §3.2.7.
§8.1 cancellation: same MUST log → SHOULD log downgrade for shape-
   violation path.
§9 conformance: add metadata_by stamping line; add cancellation
   handling checklist line (closes drift — §8's six MUSTs were not
   restated in conformance).

See also: add SESSION-1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ata_by SHOULD

§6 introspection topics: rename transformer.<type>.list →
   ovos.transformer.<type>.list (and .response). Aligns with the
   ovos.<domain>.<verb> convention shared with INTENT-4 / PIPELINE-1.
§5 per-session overrides: replace the slim "claims six fields"
   table with a full SESSION-1 §2.1 6-point registry table (wire
   type / propagation / scope / deployment-default per field). The
   gap SESSION-1 §2.1 was written to prevent — now closed.
§3.3 metadata_by: downgrade orchestrator MUST-stamp to SHOULD.
   Detecting "modified context" without a heavyweight change-
   detection contract is impossible in general; best-effort
   traceability is honest. Consumers MUST treat as hint, not
   authoritative provenance.
§9 conformance: corresponding MUST → SHOULD for metadata_by.
§1 split-orchestrator: reduce to one-sentence PIPELINE-1 §2
   reference + one paragraph naming the natural transformer-chain
   partitioning. No restatement of the audio-boundary framing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y warnings

metadata_by was speculative and could not be reliably enforced
(orchestrator cannot detect in-place mutations without a
heavyweight change-detection contract the spec deliberately did
not define). Even as a hint, it duplicated information already
deterministically available via §6 introspection + chain order.
Removed entirely:
- §3.3 audit-trail prose
- §3.3 best-effort / consumers-treat-as-hint paragraph
- §3.3 cancel_by-analogue framing
- §9 conformance bullet

Replaced with one sentence pointing readers at §6 + chain order
as the deterministic attribution path.

Also reframes the reserved-key guidance: drop the empty
"SHOULD coordinate" wrapper (which contradicted the unlimited-
mutation permission three paragraphs up); keep the bullet list of
consequence-bearing keys as a deliberate-not-blindly note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… §5.3

Two cross-spec audit fixes:

- §1.2 (new): transformers MUST stamp Message.context['transformer_id']
  on every bus emission, parallel to OVOS-INTENT-4 §3.1's
  context['skill_id'] and OVOS-PIPELINE-1 §3.1's
  context['pipeline_id'] rules. Reserved-key precedence: the three
  identity keys are mutually exclusive on a single Message;
  consumers observing more than one SHOULD treat as malformed.
  Includes the emitter vs subject distinction
  (context['transformer_id'] vs data['transformer_id']) and
  orchestrator-side loader enforcement. Downstream specs
  (CONTEXT-1 §5.2 attribution of transformer-emitted
  ovos.context.set) now have a wire-level source.

- §9 conformance: previously said transformers SHOULD NOT mutate
  session.intent_context directly, with a carve-out only for
  intent transformers. That contradicted CONTEXT-1 §5.3, which
  normatively permits any transformer holding an in-flight session
  to mutate intent_context directly. Rewrote the MAY bullet to
  permit direct mutation for any transformer type, pointing at
  CONTEXT-1 §3 / §5.3 for the key-shape rules (private entries
  prefixed by transformer_id, or skill_id when writing on behalf
  of a specific skill). Also added the §1.2 emission rule to the
  bus-side-effects bullet.

- See also: tightened the CONTEXT-1 entry — direct mutation is no
  longer characterised as 'should not bypass'; intent_context is
  no longer 'read-mostly' for transformers. The choice between
  bus events and direct mutation is the transformer's per §9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coexist

Same correction as PIPELINE-1 §3.1: forward/reply preserve upstream
identity stamps, and a transformer running over a chain that
already carries skill_id and/or pipeline_id additionally stamps
transformer_id without stripping the others. The three keys
coexist along the derivation chain; attribution consumers pick one
via the most-specific-wins precedence codified in CONTEXT-1 §5.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation keys

Two structural changes settled in design discussion:

1. §1.1 (new): formal transformer-identity definition. Identity
   is the pair (type, transformer_id). Type is one of six
   (audio, utterance, metadata, intent, dialog, tts) and fixes
   the injection point. transformer_id is opaque, unique within
   its type's registry. Multi-type plugins MAY share a
   transformer_id across types — types are independent registries.
   Wire-encoding of the pair: type via context-key choice,
   transformer_id via the value.

2. §1.3 (replacing the old §1.2 self-identification): the spec
   now claims SIX context keys, one per transformer type
   (audio_transformer_id, utterance_transformer_id,
   metadata_transformer_id, intent_transformer_id,
   dialog_transformer_id, tts_transformer_id) instead of one
   generic transformer_id. Rationale: preserves role across the
   six-stage chain, disambiguates multi-type plugins, mirrors
   SESSION-1's per-type *_transformers partitioning.

The MUST-stamp rule now spans both origination and modify-in-place
(transformers' common mode is mutating the message they were
handed; that act binds the stamp obligation just as bus emission
does). Overwrite-last applies within a single type's chain;
across types, the six keys coexist. Identity keys from other
component-types (skill_id, pipeline_id) coexist with the
transformer keys — none stripped by derivation.

Old §1.1 Scope is now §1.2 (Scope is unchanged in content).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce + derivations stamp

Two structural refinements:

1. The six self-identification context keys move from single-string
   to list-valued, renamed plural: audio_transformer_ids,
   utterance_transformer_ids, metadata_transformer_ids,
   intent_transformer_ids, dialog_transformer_ids,
   tts_transformer_ids. Each holds an ordered list of
   transformer_ids that touched the Message, preserving full
   chain provenance per type on the wire (where previously
   overwrite-last lost earlier-chain entries). Rationale: chains
   are the defining transformer shape; the wire should carry the
   history. skill_id and pipeline_id stay as single strings —
   they originate, they don't chain.

   The 'ensure self is last element' formulation makes the stamp
   rule self-idempotent and handles origination, modify-in-place,
   and re-entry uniformly.

2. The stamp rule applies to derivations (Message.forward / .reply
   / .response) when the transformer is the component performing
   the derivation and placing the resulting Message on the bus.
   The derivation mechanism is irrelevant — what matters is that
   the transformer caused a Message to appear on the bus.

§1.1 wire-encoding note and the cancel_by stamping reference in
§9 conformance updated to plural names. The singular
<type>_transformer_id naming is now explicitly not used.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Existing wording said producers 'MUST NOT populate ... values that
match the deployment default'. SESSION-1 §3.4 establishes the
canonical rule as SHOULD (not MUST) and lists [] as wire-equivalent
to omission. Aligned: now points at §3.4 (corrected reference,
previously §3.3 before renumbering) and uses SHOULD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

Completing the blacklist family for symmetry with PIPELINE-1 §5:
PIPELINE-1 owns blacklisted_skills / blacklisted_intents /
blacklisted_pipelines; TRANSFORM-1 now owns the per-type
transformer denylists, one per injection point.

§5 restructured into three subsections:

- §5.1 Per-type chain ordering (existing *_transformers, the
  preference channel — unchanged in content, framed as
  preference).
- §5.2 Per-type denylists (new): blacklisted_audio_transformers,
  blacklisted_utterance_transformers, blacklisted_metadata_transformers,
  blacklisted_intent_transformers, blacklisted_dialog_transformers,
  blacklisted_tts_transformers. Orchestrator-only single-tier
  filter (no two-tier backstop like skills/intents, because
  transformers don't return match candidates — orchestrator
  drives the chain directly). Policy overrides preference.
- §5.3 Composition mirrors PIPELINE-1 §5.5 three-stage layering:
  preference (from §5.1 or deployer default) → availability
  (drop unloaded) → policy (drop denylisted). Per injection
  point. Empty effective chain = no transformers run at that
  stage, artifact passes through. Layer-2 substrate authorization
  framing parallels PIPELINE-1 §5.6.

All twelve session fields follow the SHOULD-omit / []-equivalent-
to-omission rule of OVOS-SESSION-1 §3.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A transformer that .forward's a Message it did not modify MUST
NOT append its own transformer_id for that derivation — the
inherited <type>_transformer_ids list rides through untouched,
preserving upstream chain provenance. Modify-in-place still
binds the stamp obligation; .reply and .response are authorial
and MUST stamp.

Symmetric with the analogous rules in PIPELINE-1 §3.1
(pipeline_id) and INTENT-4 §3.1 (skill_id). Consistent
forward-is-propagation semantics across all three
component-identity surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JarbasAl and others added 7 commits May 26, 2026 14:34
'The most common LLM hook today' is temporal meta-commentary.
Replace with 'A natural injection point for language models',
which is a timeless characterization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the long-standing footgun where transformers had no
direct access to the utterance/dialog/TTS content language and
were reaching for session.lang (user preference, not content) or
relying on the orchestrator copying data.lang into context.lang
as a workaround.

New §3.0: common contract for the lang parameter across the
three text-bearing chains (§3.2 utterance, §3.5 dialog, §3.6
TTS). Orchestrator sources lang from Message.data.lang; passes
it through when present; MUST NOT synthesize from session.lang
or other signals when absent. Consumer (transformer) decides
how to resolve absence per its own policy.

Audio (§3.1), metadata (§3.3), and intent (§3.4) transformers
do not receive the parameter — audio is pre-STT, metadata is
context-only with no artifact, intent has Match.lang by
construction.

Per-type Input sections updated for §3.2, §3.5, §3.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audio transformers can legitimately receive lang when the
producer authoritatively knows it — a UI language selector, an
upstream audio language-detector plugin, a device-configured
language, a test fixture, or an STT decoder run with a fixed
language hypothesis. The spec makes no claim about source;
presence alone is authoritative.

Updated §3.0 to list four chains (audio, utterance, dialog, TTS)
that receive lang. Metadata (§3.3) and intent (§3.4) remain
excluded for their existing reasons (context-only artifact;
Match.lang already authoritative).

§3.1 audio transformer Input updated to include lang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s back to data.lang

A bidirectional-translation transformer (and any other transformer
that mutates the artifact's language — language-detectors that set
lang where it was None, transformations that obscure the prior
tag) needs to communicate the post-transformation language to
downstream stages. Making lang only an input meant data.lang
diverged from the artifact's actual language after translation.

§3.0 now specifies bidirectional propagation:

- The orchestrator passes lang in on each transformer call.
- Each transformer returns a possibly-mutated lang alongside its
  modified artifact and context.
- The orchestrator threads (artifact, lang) into the next
  transformer in the chain.
- At chain end, the orchestrator MUST writeback the final lang
  value to Message.data.lang on the relevant Message. Setting
  data.lang when non-None; unsetting it when None and the field
  was previously present. The chain's conclusion is authoritative
  for downstream stages.

The orchestrator still MUST NOT synthesize lang from session.lang
or other signals — only artifact/transformer flow.

§3.1 (audio), §3.2 (utterance), §3.5 (dialog), §3.6 (TTS) Output
sections updated to list lang explicitly alongside the artifact
and context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ncel_reason vocabulary; drop §1.3 idempotency clause

Three audit findings actioned:

- §3.3 metadata transformer routing-key mutation: previously
  'MAY mutate however it sees fit' with a mild 'consequences
  worth being deliberate about' disclaimer. Tightened to
  SHOULD NOT mutate source/destination unless the transformer's
  deliberate role is re-routing; transformers that do MUST
  understand the MSG-1 §5 derivation consequences.

- §8.1 cancel_reason vocabulary: was free-form by default,
  making observability/audit brittle. Mint five reserved values
  (stop_word, transcription_invalid, policy_block,
  parental_control, other) for the common cases; transformers
  SHOULD use one. Free-form remains conformant, deployers are
  encouraged to coordinate. other is the universal fallback for
  transformers that don't want to think about vocabulary.

- §1.3 list-append rule: drop the 'ends with self -> no-op'
  idempotency clause. The intended invariant is 'list records
  every touch including re-entry', preserving chain provenance
  verbatim. The no-op-on-re-entry rule destroyed exactly the
  signal the list exists to capture. Consumers that want to
  collapse runs MAY do so at read time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CONVERSE-1 §3.4 (PR #25) explicitly cites the metadata-transformer
hook as the recommended position for mutating session.active_handlers
and session.response_mode. Add this to the §3.3 list of permitted
mutations, with the §5.4 cancellation-semantics back-reference for
mid-wait holder changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "on cancellation per §8 — stop the current chain..." bullet was
a near-verbatim duplicate of the preceding bullet. Merged the two
into one complete statement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JarbasAl JarbasAl force-pushed the spec/transformer branch from 8ac8212 to 270c465 Compare May 26, 2026 13:38
JarbasAl added a commit that referenced this pull request May 26, 2026
Per the new "dedicated APPENDIX PR" policy, consolidating the
prior-art and design-deviation notes from the OVOS-CONTEXT-1
(PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR.
Those spec PRs are now scoped to their own spec files only;
the discussion / cross-spec touchups / in-tree prior art
all live here.

Adds to §4 Design rationale:
- "Intent context (CONTEXT-1)" — the Adapt-only origins, the
  two-scope (private/shared) formalization, jurebes /
  nebulento / palavreado as prior art for excludes_context,
  the engine-side §5.3 mutation pathway resolving the
  PIPELINE-1 §4.2 contradiction.
- "Transformer plugins (TRANSFORM-1)" — the architectural-
  pattern framing, intent transformers as the system-typing
  home, the nine concrete in-tree plugins as prior art, the
  ascending-vs-descending priority deviation called out,
  cancellation alignment with existing plugin convention,
  and the language disambiguation hierarchy mirroring current
  ovos-core code paths.

Removes from §7 Known gaps:
- "Intent context" bullet (formalized in CONTEXT-1).
- "The utterance-transformer chain" bullet (formalized in
  TRANSFORM-1).
JarbasAl added a commit that referenced this pull request May 26, 2026
Split specs into intent / bus / orchestrator stacks. Add all 11
specs including in-review ones (INTENT-4 #9, INTENT-2 v3 #4,
TRANSFORM-1 #20, CONTEXT-1 #18, CONVERSE-1 #25). Add role-based
reading order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
JarbasAl added a commit that referenced this pull request May 26, 2026
* docs: README — full spec-set refresh for the in-flight stack

Update the README to reflect the full spec set landing together:
the original intent stack (INTENT-1/-2/-3, MSG-1) plus the
in-flight specs (INTENT-4, SESSION-1, SESSION-2, PIPELINE-1,
TRANSFORM-1, CONTEXT-1, CONVERSE-1).

Changes:

- Specification table reorganised into three stacks — intent,
  bus, orchestrator — each with a one-paragraph narrative. This
  is the structure APPENDIX §1.2 already uses; the README now
  mirrors it for consistency.
- New 'Where to start' section with four reading-order paths
  matching common audiences: skill author, plugin author,
  orchestrator author, architecture surveyor. Addresses the
  'no clear entry point' friction first-time readers hit when
  the set went from 4 to 11 specs.
- New 'How this compares to other voice frameworks' section
  summarising APPENDIX §2's positioning (Home Assistant /
  hassil, Rasa, Alexa / Dialogflow, Rhasspy / Hermes, Wyoming).
  Brief; points at APPENDIX for detail.
- Reference-implementation section split: ovos-spec-tools
  covers the intent stack; bus and orchestrator stacks are
  acknowledged as not-yet-having-ground-up-reference-impl with
  pointer to APPENDIX §5 divergence catalogue.
- New 'Implementation status' section: clarifies the spec-set
  Draft→stable transition is tracked at #5; intent stack is
  most aligned with current ovos-core; known gaps cited from
  APPENDIX §7.
- Contributing section adds the one-file-per-PR rule (per
  AGENTS.md repo convention) and clarifies dev vs master
  targeting.
- Updated draft warning to reference APPENDIX §5 divergence
  catalogue and link to #5.

No normative-spec changes; README and supporting-metadata only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* README: establish voice OS framing; add OS-analogy table

Replace "voice assistant ecosystem" opening with "voice operating
system" framing. Add "What a voice operating system is" section with
OS-analogy table (scheduler, IPC, shared memory, process supervision,
loadable modules, syscall ABI) and the portability consequence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* README: full spec table — three stacks, open PR links

Split specs into intent / bus / orchestrator stacks. Add all 11
specs including in-review ones (INTENT-4 #9, INTENT-2 v3 #4,
TRANSFORM-1 #20, CONTEXT-1 #18, CONVERSE-1 #25). Add role-based
reading order.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JarbasAl added a commit that referenced this pull request May 26, 2026
… PIPELINE-1 (#14)

* docs: APPENDIX — audit-driven corrections (pipeline + registration model)

Applies corrections found by auditing claims against actual OVOS
source code:

1. **§6.7 enable/disable_intent legacy names corrected** to the
   real `mycroft.skill.enable_intent` / `mycroft.skill.disable_intent`.

2. **§6.4 direct-bus-subscribe claim broadened** — verified the
   standard ovos-padatious-pipeline-plugin and
   ovos-adapt-pipeline-plugin both subscribe directly to
   registration topics, not just downstream plugins.

3. **§6.4 "side-effects during match" softened** — audit confirms
   the official match_* methods are already side-effect-free; the
   skill-activation emit is orchestrator-side, not plugin-side.
   Rule reframed as forward-looking discipline.

4. **§3 / §4 / §6.4: PIPELINE-1 *refines* the plugin model rather
   than *introducing* it.** OVOSPipelineFactory, pipeline_plugins
   dict, _PIPELINE_MIGRATION_MAP, and the official plugin set
   already exist. PIPELINE-1's actual contribution narrows to:
   formalizing the contract, `<owner_id>:<intent_name>`
   polymorphism, universal `ovos.utterance.handled` end-marker,
   and the renames.

5. **§3 / §4 / §6.4: tier convention is compatible, not a
   divergence.** From the bus each tier is already a distinct
   `pipeline_id` in `Session.pipeline`. How a Python plugin class
   internally serves multiple `pipeline_id`s (one class with
   match_high/medium/low methods, an orchestrator-side
   suffix-decoder, separate plugin instances, etc.) is
   implementation choice the spec does not constrain.

6. **§4 / §6.4: registrations-are-broadcast is compatible, not a
   divergence.** OVOS already broadcasts registrations on the
   bus; plugins already subscribe directly. INTENT-4 does not
   change this — it only renames topics into the `ovos.intent.*`
   namespace (see §6.7). Migration is a string replacement.
   What IS new is the orchestrator's passive registration index
   that backs `ovos.intent.list` / `.describe` — that's added as
   a separate §6.4 divergence ("new orchestrator responsibility,
   not a change to existing behaviour").

7. **§6.6 adds note on engine-specific introspection topics**
   (`intent.service.adapt.*`, `intent.service.padatious.get`) —
   plugin-defined surface; spec does not claim authority over
   them.

No spec-body changes; APPENDIX only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: APPENDIX §6.4 — drop the "dissolution" divergence

Same logic as the broadcast-registrations correction: the
orchestrator already treats every loaded plugin uniformly, and
`IntentHandlerMatch.match_type` is an opaque string the plugin
chooses — nothing in current code prevents a plugin from setting
`match_type = "<pipeline_id>:<intent_name>"` and being dispatched
to itself. The `<owner_id>:<intent_name>` polymorphism PIPELINE-1
names is therefore already supported; the spec only writes down a
convention current code allows but does not document.

Design rationale around the polymorphism stays in §3/§4 — it is
useful explicit naming. But it is not a divergence and should not
sit in the divergence catalogue.

§6.4 now contains a single real divergence: the orchestrator's
new passive registration index backing `ovos.intent.list` /
`.describe`. Everything else in §6.4 is forward-looking
discipline or a workshop bug, not an architectural change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: keep session.pipeline (revert the rename row)

PIPELINE-1 now keeps the existing `session.pipeline` field name
instead of renaming it to `pipeline_stages`. Drop the §6.2
rename row and revert the prose mentions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §7: note utterance-transformer chain as a deferred spec (out of scope for PIPELINE-1)

* APPENDIX §4 / §7: design notes for OVOS-CONTEXT-1 and OVOS-TRANSFORM-1

Per the new "dedicated APPENDIX PR" policy, consolidating the
prior-art and design-deviation notes from the OVOS-CONTEXT-1
(PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR.
Those spec PRs are now scoped to their own spec files only;
the discussion / cross-spec touchups / in-tree prior art
all live here.

Adds to §4 Design rationale:
- "Intent context (CONTEXT-1)" — the Adapt-only origins, the
  two-scope (private/shared) formalization, jurebes /
  nebulento / palavreado as prior art for excludes_context,
  the engine-side §5.3 mutation pathway resolving the
  PIPELINE-1 §4.2 contradiction.
- "Transformer plugins (TRANSFORM-1)" — the architectural-
  pattern framing, intent transformers as the system-typing
  home, the nine concrete in-tree plugins as prior art, the
  ascending-vs-descending priority deviation called out,
  cancellation alignment with existing plugin convention,
  and the language disambiguation hierarchy mirroring current
  ovos-core code paths.

Removes from §7 Known gaps:
- "Intent context" bullet (formalized in CONTEXT-1).
- "The utterance-transformer chain" bullet (formalized in
  TRANSFORM-1).

* APPENDIX: SESSION-1 rationale; introspection patterns; revised divergences

§4 — new 'Session (SESSION-1)' rationale subsection: why it exists,
   prescriptive-not-descriptive scope, omission-as-deferral
   semantics, four language signals.
§4 'Transformer plugins' — language-disambiguation note updated:
   hierarchy moved out of TRANSFORM-1 to SESSION-1 §3.2; transformer
   types now just named as natural producers of signals,
   consolidation is consumer's stage-dependent choice.
§6.4 architectural divergences — add: handler-trio ownership shifted
   to orchestrator (third-party handler code carries no obligation);
   per-pipeline_id intent introspection (PIPELINE-1 §10); CONTEXT-1
   scope discriminator. Update ovos.utterance.handled note to
   reflect the trio-ownership shift (workshop fix is now in the
   wrapper, not the handler).
§6.5.1 (new) — introspection-patterns table comparing INTENT-4,
   PIPELINE-1, CONTEXT-1, TRANSFORM-1 surfaces. Three shared
   properties (pull-query is source of truth, no completeness
   signal, per-process slices under split orchestrators). Notes
   naming-convention inconsistency as candidate follow-up.
§6.6 — remove obsolete 'session shape deferred' note; replace with
   SESSION-1 ownership statement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: update §6.5.1 topic-naming (resolved); add new §6.4 divergences

§6.5.1: topic-naming inconsistency is now resolved — all four .list
   surfaces use ovos.<domain>.<verb>. Update the table and replace
   the 'not yet uniform' note with a rename log.
§6.4: add four new divergence entries:
   - Skill self-identification on every emission (INTENT-4 §3.1)
   - recognizer_loop:utterance de-prescribed (PIPELINE-1 §9.1)
   - .list topics standardized
   - (keeps the existing scope-discriminator / handler-trio /
     per-pipeline_id / utterance.handled entries)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: cleanup — drop draft-history meta-commentary

Stand-alone design notes, not a changelog.

§4 design rationale: rewrite Session block and TRANSFORM-1 lang
   bullet to describe current design, not 'moved from earlier draft'.
§6.4 divergences: rewrite handler-trio / trio-ownership / scope-
   discriminator / skill_id-emission / recognizer_loop /
   topic-naming entries to state current design, not contrast with
   earlier drafts.
§6.5.1 introspection patterns: drop 'in this round' rename note.
§9 (rewritten 'Design history' → 'The spec set, in three stacks'):
   drop §9.3 audit-driven-refinement entirely (changelog content);
   merge §9.1 + §9.2 into one tighter section about how the eight
   specs partition and what reference implementations exist.
§10 compatibility levels: soften 'was previously spoken of at' to
   'is spoken of at'; replace the 'no longer describes' framing
   with a forward-looking 'tuple covering all eight specs is a
   planned follow-up'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: update divergence catalog for CONTEXT-1 key-shape collapse + dispatch stamping

§6.4: rewrite the CONTEXT-1 scope-discriminator entry to reflect
   the bigger change — scope AND origin both collapsed into the
   key shape. requires_context discriminator is the surviving
   surface (default private).
§6.4: rewrite the skill_id-on-every-emission entry to lead with
   the structural enforcement (dispatch stamping + forward/reply
   inheritance), with loader interception as a follow-up rather
   than the primary path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: clarify topic-naming claim as prefix-uniform, verb depth varies

* APPENDIX §6.5.1: flag the 'intent' word collision across three introspection topics

Cross-spec audit B1: 'intent' plays three different roles across
the four-spec introspection table — registered intents (INTENT-4),
compiled-in-a-matcher intents (PIPELINE-1), and intent-transformer
plugins (TRANSFORM-1). The shapes are deliberate and the payloads
are distinct, but the topic strings read confusingly at a glance.
Added an informative paragraph naming the three meanings and
clarifying that ovos.transformer.intent.list follows the per-chain
ovos.transformer.<type>.list pattern, where 'intent' is the chain
type — not a listing of intents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §4 Transformer: design note on the six per-type self-identification keys

Document the rationale for TRANSFORM-1 §1.3 claiming six per-type
context keys (audio_transformer_id, utterance_transformer_id, ...)
rather than a single generic transformer_id. Two arguments: (1)
role preservation across the six-stage chain, mirroring the
per-type partition that already exists in §1.1 registries, §5
session overrides, and §6 introspection topics; (2) multi-type-
plugin disambiguation, since §1.1 permits a single transformer_id
across types and a generic context key would erase the role at
emit time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §4 Transformer: record list-valued attribution, denylist symmetry, and the per-type field-count tradeoff

Four design notes capturing the recent TRANSFORM-1 evolution:

- Update the existing per-type self-id bullet to reflect the
  plural list-valued context keys (audio_transformer_ids etc.,
  not the older singular names).
- New bullet: list-valued attribution preserves full chain
  provenance per type; the last entry is the most-recent stamp.
  Skills and pipelines stay single-string because they originate
  rather than chain.
- New bullet: per-type denylists (six blacklisted_*_transformers)
  complete the policy surface, mirroring PIPELINE-1's
  pipeline/blacklisted_pipelines pair. Three-stage composition
  (preference → availability → policy) parallels PIPELINE-1 §5.5.
- New bullet: acknowledge the per-type 'explosion' (12 session
  fields + 6 context keys), defend the choice against the
  transformer_<type>:<name> prefix-encoding alternative (direct
  lookup vs prefix parsing), note that SHOULD-omit makes the
  common case zero-cost on the wire, and document the
  object-valued form as a clean fallback if the field count ever
  proves painful in practice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §4 CONTEXT-1: rationale for default-private scope

Add design-rationale paragraph explaining why ovos.context.set
defaults to private scope when the canonical worked example
(Person → Bob) is naturally cross-skill. Three reasons: migration
fidelity (current Adapt set_context is effectively skill-private),
safer footgun direction (accidental shared-leak is harder to
debug than accidental cross-skill miss), and authorability
(cross-skill coordination deserves a conscious explicit scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §6: record recognizer_loop:utterance -> ovos.utterance.handle rename

Move the entry-topic from §6.1 'already aligned' to §6.4
'architectural divergences' — it is no longer a name kept
verbatim, since PIPELINE-1 §9.1 now prescribes
ovos.utterance.handle. Rationale paragraph cites the three
MSG-1 §2.1.2 naming convention violations: ':' as separator,
implementation-role leading segment, missing request/terminal
verb pairing.

Migration cost spelled out (every audio-input service emits,
every intent-service handler subscribes: ovos-dinkum-listener,
ovos-simple-listener, ovos-audio, ovos-core/intent_services).

§6.7 predecessor-topic table updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: §2.5 Rasa/hassil/ASK/Mycroft comparisons; §6.5.2 session-field + stamp-rule cheat-sheets

Two informative additions:

- §2.5 (new): extends the §2 comparison set with Rasa, hassil,
  ASK / Dialogflow, and Mycroft. Locates the CONTEXT-1 design
  against Rasa's policy-engine-coupled forms; locates
  TRANSFORM-1 §3.4 against ASK/Dialogflow built-in entity types
  as the injectable open contract; documents Mycroft as the
  predecessor whose ad-hoc model the spec family formalizes.

- §6.5.2 (new): session-field cheat-sheet consolidating the 26
  fields claimed across SESSION-1, PIPELINE-1, TRANSFORM-1, and
  CONTEXT-1 into a single reference table — owner spec, role
  (preference / policy / signal / identity), empty-array
  semantics. Followed by a stamp-rule cheat-sheet covering the
  three component-identity context-key surfaces (skill_id,
  pipeline_id, <type>_transformer_ids) and their behaviour
  across origination, .reply / .response, and .forward.

Both reduce cross-spec bouncing for implementers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: reorganize from 10 sections to 7, restructure for flow

The appendix had become a dumping ground after multiple rounds
of additions. Restructured with clear narrative flow:

§1 About the OVOS specifications — formalization framing,
   the three-stack overview (was §9), compatibility levels
   (was §10), reference implementations + ecosystem tooling
   (folds in ovos-spec-tools from §9 and ovos-localize from §8).

§2 Comparison with other voice-assistant systems — merges
   the HA/Rhasspy material (was §2) with the Rasa/ASK/
   Dialogflow/Mycroft/hassil material (was §2.5) into a
   single comparator section, ordered by relevance: HA &
   Rhasspy (shared lineage) → open-vs-closed structural
   argument → Mycroft (predecessor) → Rasa (CONTEXT-1
   comparator) → ASK/Dialogflow → hassil (grammar-only) →
   summary of where OVOS leads/follows/differs.

§3 Architectural patterns — the bus as substrate (was §5)
   and the pipeline-plugin model (was §3) grouped as the
   two cross-cutting architectural moves. Bus-substrate
   section gains an explicit subsection on the layer-2
   authorization story (preference / policy split).

§4 Design rationale, per specification — was §4 itself but
   now systematically per-spec (INTENT-1+2+3 grouped,
   MSG-1, SESSION-1, INTENT-4, PIPELINE-1, CONTEXT-1,
   TRANSFORM-1). Stale references purged; recently added
   rationales (most-specific-wins precedence, bidirectional
   lang propagation, per-type denylists, etc.) folded in.

§5 Where the specs differ from current OVOS code — was §6
   but reorganized: removed the §6.5.1 introspection-
   patterns table and §6.5.2 cheat-sheets (they aren't
   divergences from code, they're implementer reference —
   moved to §6). Renumbered to §5.1–§5.7.

§6 Implementer reference — new top-level section gathering
   the cross-spec reference tables that were scattered:
   topic-name conventions (with the 'intent' overload
   clarification), session-field cheat-sheet,
   component-identity stamp-rule cheat-sheet, introspection
   patterns table. These don't belong inside a 'divergences
   from code' section; they're how-to material for fresh
   implementers.

§7 Known gaps and planned work — unchanged content, last
   section. Trimmed stale entries about CONTEXT-1 and
   TRANSFORM-1 as 'planned' (they've shipped); added
   conversation-level evaluation infrastructure as a gap.

Net: same content, far more navigable. Cross-references
updated throughout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §2: drop Mycroft comparator subsection; renumber 2.4-2.7 to 2.3-2.6

Mycroft AI Inc shut down in 2023; the fork is years old and the
intervening design is not Mycroft's. Keeping a 'comparison to
predecessor' subsection over-attributes the architecture and
mis-frames OVOS as a derivative project rather than a long-
running open project in its own right.

Section §2 is now a comparison with currently-relevant
voice-assistant systems only:

- §2.1 Home Assistant and Rhasspy (shared grammar lineage)
- §2.2 Closed domain vs open ecosystem
- §2.3 Rasa
- §2.4 Amazon ASK / Google Dialogflow
- §2.5 hassil
- §2.6 Summary

Collateral: dropped Mycroft from the project-name list in the
intro and from the comparator enumeration in the §2.6 summary.
Legacy topic strings that happen to contain 'mycroft' in their
literal name remain in the §5 divergence tables and §5.7
predecessor-topic mapping as factual code references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §3.3: external-protocol interoperability injection points

Make the family's interop story explicit rather than implied. New
§3.3 catalogues three injection points where external protocols
plug into the spec family:

1. Pipeline plugins as the dispatch-layer adapter — LLM APIs
   (OpenAI Chat Completions and compatible), deterministic
   template matchers (hassil), external intent classifiers,
   agent-tool protocols (MCP).
2. Transformer chains as the artifact-pipeline adapter —
   bidirectional translation, STT validators, content-policy
   filters, acoustic-event detectors.
3. Bus boundary as the wire-level adapter — Wyoming
   bridges, MQTT-based stacks, HiveMind-style layer-2
   substrates.

Per-protocol notes for Wyoming, OpenAI, MCP, hassil, MQTT,
A2A — naming where each plugs in. The single-flip routing and
no-central-state stance (§3.1) are what make the bus-boundary
adapter feasible without modifying the assistant core.

Concrete suggestion: a translation tool between OVOS-INTENT-2
locale resources and HA's hassil/intents YAML would let the
two corpora cross-pollinate mechanically. Added to §7 known
gaps as planned tooling.

The three injection points are intentionally not exhaustive —
they're the points the spec family deliberately keeps clean. A
protocol needing deeper integration is a signal of
architectural overlap rather than complementarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX: add CONVERSE-1 to orchestrator-stack narrative; close multi-turn gap

OVOS-CONVERSE-1 (PR #25) fills the multi-turn conversation gap that §7 previously listed as planned work. Update §1.2 stack description to include it, and drop the §7 gap entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §5.3, §5.4: update for PIPELINE-1 §4.2 relaxation + §7.0 polymorphism collapse

Two divergence-catalogue entries updated to reflect the
PIPELINE-1 restructure:

- The §5.4 'side-effect-free during match' entry is rewritten
  as 'match contract is the single obligation' — match's only
  MUST is returning Match-or-null; bus emissions during match
  are allowed; session mutation during match is via
  Match.updated_session (explicit channel).

- New §5.4 entry: 'Match.updated_session as the match-phase
  session channel' — promotes the existing ovos-core code
  pattern `sess = match.updated_session or
  SessionManager.get(message)` to a normative Match field.
  Claiming plugin's mutations land; declined plugin's
  mutations drop at the boundary.

- The §5.3 'Dispatch payload uses polymorphic owner_id' entry
  is rewritten as 'unified owner_id' — reflects PIPELINE-1
  §7.0's collapse to two handler-owner shapes (plain skill,
  pipeline plugin with bundled handlers where pipeline_id ==
  skill_id) plus the pure-matcher recognition. Notes the
  conceptual mapping skill_id ≈ voice_app_id, pipeline_id ≈
  matching-engine id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §1.2, §7: SESSION-2 fills the lifecycle gap

OVOS-SESSION-2 (in flight at PR #27) defines session lifecycle
and state ownership. Update:

- §1.2 orchestrator-stack narrative adds SESSION-2 to the stack
  description with one-line summary of its scope (stateless
  orchestrator for named sessions, orchestrator-owned default
  session, projection mandate).

- §7 gap entry rewritten: SESSION-2 lands the lifecycle piece;
  what remains deferred is the set of session preference fields
  that need to be claimed under SESSION-1 §2.1 by their owning
  specs (preferences / OCP / persona / locale).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §1.2: SESSION-2 narrative — SHOULD-project + MAY-internal (not 'mandate')

Sync with SESSION-2 §2.4 relaxation (commit 6a882c8). The
projection pathway is SHOULD-when-practical; plugins MAY hold
internal state with full lifecycle ownership and best-effort
resumption.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* APPENDIX §5.2.1: document ovos.session.sync / update_default for removal

These ovos-core topics are not defined by any spec. SESSION-2 §6.4
explicitly avoids naming them. They should be retired in favour of
clients reading session state from normal Message flow
(ovos.utterance.handled or any other session-carrying Message).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* README + APPENDIX §1.0: establish voice OS framing

README intro replaced: "voice assistant ecosystem" → "voice
operating system" with an OS-analogy table (scheduler, IPC,
shared memory, process supervision, loadable modules, syscall ABI).

APPENDIX §1.0 (new): The voice operating system concept — two
conflations addressed: (1) voice assistant product (closed,
vertically integrated vs open platform); (2) LLM wrapper (LLMs
fit as pipeline plugins, utterance/dialog/metadata transformers —
one possible multi-role deployment, not the architecture itself).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert: move README voice-OS framing to its own PR (#28)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* APPENDIX: fix stale PIPELINE-1 refs; slim redundant prose

- owner_id → skill_id throughout (§3.2, §3.3, §4.5, §4.6)
- match(utterance,…) → match(utterances,…) (§4.5)
- Match.captures → Match.slots (§4.7)
- complete_intent_failure → ovos.intent.unmatched in §5.1/§5.3/§5.7;
  add rename row to §5.2 table
- Dispatch payload block in §5.3 rewritten: {lang, utterance, slots},
  handler-lifecycle uses {skill_id, intent_name, optional exception}
- §5.5: add ovos.intent.unmatched and ovos.utterance.speak entries
- §2.5 hassil: drop standalone subsection; fix §2 intro cross-ref
- §1.3 compat levels: condense to bullets
- §1.4: drop ovos-localize "honest notes" paragraph
- §3.1.3: trim to essential bus-substrate mechanics
- §4.7: trim per-type-explosion and per-type-self-id bullets
- §5.4: trim rename and match-contract entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JarbasAl and others added 5 commits May 28, 2026 01:15
- §3.3: OVOS-CONVERSE-1 §3.4 → §3.3 (metadata-transformer hook)
- §3.3: CONVERSE-1 §5.4 → §5.3 (cancellation semantics)
- §5.1: OVOS-SESSION-1 §2.5 → §2.1 (deployment-default rule)
…ale and use cases to appendix §4.7

- Move 'Why this injection point' rationale blocks (§3.1–§3.6) to appendix/rationale.md §4.7
- Move 'Canonical use cases' and 'Where LLMs fit' lists to appendix
- Move 'Cross-cutting concerns' opening paragraph to appendix
- Move cancellation use-case list (§8) to appendix
- Move introspection aggregate-query rationale and consumer list to appendix
- Collapse §3.0 lang parameter redundant bullets into compact forms
- Collapse §5.1 'propagates unchanged' column to prose footnote-style
- Add [Informative] heading to §8.1 cancel_reason vocabulary table
- Add routing-key mutation to §9 conformance obligations
- Add observer conformance entry to §9
- Collapse §1.3 self-identification (cut provenance/coexistence examples)
- Fix TTS introspection topic typo (list.list.response → list.response)
- Fix §4.8→§4.7 cross-references (TRANSFORM-1 rationale lives in §4.7)

Co-Authored-By: opencode/glm-5.1 <noreply@opencode.ai>
- Add per-type injection-point rationale (why each is the only point
  for its class of work)
- Add per-type canonical use cases (audio, utterance, metadata,
  intent, dialog, TTS)
- Add per-type LLM-fit notes
- Add cross-cutting architectural value rationale
- Add cancellation in-spec use cases
- Add introspection surface rationale (no aggregate query, typical
  consumers)

Content moved from transformer.md normative body per editorial
overhaul — rationale and use-case lists belong in the appendix,
not in the prescriptive spec.

Co-Authored-By: opencode/glm-5.1 <noreply@opencode.ai>
Match.owner_id → Match.skill_id in 5 locations. skill_id is the
universal handler identity per the architecture model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@JarbasAl JarbasAl changed the title OVOS-TRANSFORM-1: Transformer Plugins Specification (draft) OVOS-TRANSFORM-1: Transformer Plugins Specification May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant