OVOS-TRANSFORM-1: Transformer Plugins Specification#20
Draft
JarbasAl wants to merge 36 commits into
Draft
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
JarbasAl
added a commit
that referenced
this pull request
May 24, 2026
Per the new "dedicated APPENDIX PR" policy, consolidating the prior-art and design-deviation notes from the OVOS-CONTEXT-1 (PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR. Those spec PRs are now scoped to their own spec files only; the discussion / cross-spec touchups / in-tree prior art all live here. Adds to §4 Design rationale: - "Intent context (CONTEXT-1)" — the Adapt-only origins, the two-scope (private/shared) formalization, jurebes / nebulento / palavreado as prior art for excludes_context, the engine-side §5.3 mutation pathway resolving the PIPELINE-1 §4.2 contradiction. - "Transformer plugins (TRANSFORM-1)" — the architectural- pattern framing, intent transformers as the system-typing home, the nine concrete in-tree plugins as prior art, the ascending-vs-descending priority deviation called out, cancellation alignment with existing plugin convention, and the language disambiguation hierarchy mirroring current ovos-core code paths. Removes from §7 Known gaps: - "Intent context" bullet (formalized in CONTEXT-1). - "The utterance-transformer chain" bullet (formalized in TRANSFORM-1).
This was referenced May 24, 2026
Adds a new specification covering the six transformer plugin types
OVOS already runs informally (audio, utterance, metadata, intent,
dialog, TTS) as a single unified spec with one shared abstraction
and six per-type subsections.
Highlights:
- Six lifecycle hooks defined precisely against OVOS-PIPELINE-1's
per-utterance flow. Each hook runs an ordered chain of black-box
transformers; every transformer in the chain runs (no
claim-or-decline).
- Intent transformers (§3.4) called out as the spec'd home for
system-type entity injection (dates, numbers, durations,
ordinals, named entities) - the LLM-friendly place to apply
OVOS-INTENT-1 §5.3's deferred slot typing globally without each
skill rolling its own date parser.
- Per-type "Where LLMs fit" notes through §3.1-§3.6.
- Ascending priority ordering (lower = earlier; default 50),
aligning with the fallback-skill convention already in OVOS;
explicit deployer order wins when present. Current OVOS sorts
transformers descending - APPENDIX flags this as a normative
deviation for current plugins to flip.
- Per-session chain overrides via optional
session.{audio,utterance,metadata,intent,dialog,tts}_transformers
fields, parallel to OVOS-PIPELINE-1's session.pipeline.
Restricted / remote-peer sessions can disable LLM-backed
transformers they don't trust off-network.
- Passive registration index per type (transformer.*.list with
.response), same posture as PIPELINE-1's intent index and
CONTEXT-1's §5.4 context index.
- Robust error handling: transformer exceptions and shape-
violations are treated as no-op transforms; orchestrator logs
and proceeds. Single transformer's bug never aborts the
utterance (mirrors PIPELINE-1 §6.2).
- Non-goals: typed-value schemas, per-plugin behavioural
contracts, cross-transformer coordination, hot reload, timeout
policy.
PR scope is just transformer.md + README index + CHANGELOG entry
+ APPENDIX prior-art bullet. Cross-spec touches (PIPELINE-1 flow
annotation, MSG-1 §4 session-fields note, INTENT-1 §5.3 forward
reference) deferred to follow-up PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every must-fix from the end-to-end review plus introduces
the cancellation semantics that current OVOS expresses informally
via ovos-utterance-plugin-cancel.
Must-fixes:
- §3.4 ordering with OVOS-CONTEXT-1 §5.3 disambiguated: engine-side
session mutation runs FIRST, intent-transformer chain runs second.
Lets an intent transformer read context the matching engine just
wrote (the natural enrichment direction).
- §3.4 owner_id/intent_name MUST NOT change is now backed by an
orchestrator enforcement clause: violations are treated as §7
shape violations, the orchestrator discards the transformer's
output and proceeds with the prior step's Match unchanged.
- §3.2 utterance transformers MAY now return an empty list (signals
"no plausible transcription"); covers the existing
transcription-validator pattern that the previous MUST contradicted.
- §3.4 / §3.3 capture/metadata deletion rules softened from MUST NOT
to SHOULD NOT, with carve-outs for filtering / redaction / PII
removal as the deployer-configured purpose.
- §1.1 / §2 — flow diagram explicitly framed as the canonical staged
flow this specification assumes. Streaming and end-to-end
implementations may omit hooks for stages they don't materialise,
per published conformance scope.
- §3.6 TTS "MUST NOT silently change perceived language" softened to
SHOULD NOT re-synthesize in a different language — vague
unenforceable rule clarified by reference to the staging
(translation is §3.5 dialog territory).
New §8 — Utterance cancellation:
Two surfaces sharing one termination contract:
- §8.1 in-band: a transformer signals cancellation by setting a
reserved cancel:{by, reason} key in its returned context.
Orchestrator inspects after every transformer; aborts remaining
chain and all subsequent stages on sight. Synchronous, race-free.
- §8.2 out-of-band: ovos.utterance.cancel bus event, emittable by
any participant (skills, devices, peers, debugging tools). The
orchestrator handles best-effort against in-flight lifecycles;
§8.4 documents the inherent async race and the consequence (late
cancels MUST be logged but cannot unwind side effects, consistent
with §7's no-rollback rule).
- §8.3 terminal events: cancellation terminates with
ovos.utterance.cancelled (new) followed by ovos.utterance.handled
(OVOS-PIPELINE-1 §9.5). Orchestrator MUST NOT emit
complete_intent_failure on the cancellation path — failure and
cancellation are distinct observables. The cancelled event carries
the same by/reason pair the cancellation signal carried.
Other review items addressed:
- §6 — adds an aggregate transformer.list query alongside the six
per-type queries, for debugging tools that don't want six
round-trips. priorities field in responses is now always returned
(no more "absent when explicit-order overrides priority"
cleverness). loaded vs chain distinction clarified.
- §5 — explicit wire-weight note: per-session override fields cost
zero for sessions that don't set them; deployers using them for
high-traffic sessions SHOULD keep lists short.
- §7 — concurrency contract (transformer instances are process-wide
and MUST be re-entrant; per-instance state MUST be guarded for
concurrent access) and no-rollback note (side effects performed
mid-chain are not unwound by later raises or §8 cancellations;
transformers needing transaction semantics implement them
internally).
Conformance (§9) and the new §8 rules thread through both
orchestrator and transformer MUST/MAY bullets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related reframes to put the spec into proper prescriptive voice-OS-design territory: A. Transformer chains as architectural pattern, not mandate -------------------------------------------------------------- §1 / §2 / §9 reframed throughout: an orchestrator MAY implement transformer chains at any subset of the six injection points (including none). For each chain it implements, the corresponding contract binds; for chains it does not implement, no obligation arises. A null-implementation that runs no chains is conformant. The spec defines the design pattern and the per-injection-point contract, not a feature list. §3 reframed: each of §3.1-§3.6 now opens with an explicit **architectural rationale** explaining WHY this exact point in the lifecycle is the right home for this kind of work — what artefact exists here that doesn't exist elsewhere, what mutations are possible here that aren't possible at any other point. The rationale is the justification for the injection point existing in the spec at all; the canonical use cases and LLM-fit notes follow from the rationale. This makes the spec prescriptive (here's how a voice OS should be structured and why) rather than descriptive (here's what current OVOS does). Per-rationale highlights: - §3.1 audio: only point where unprocessed audio exists. STT destroys prosody / acoustic-language / speaker info. - §3.2 utterance: only point with text but no semantic commitment. Mutations ripple uniformly through every downstream engine. - §3.3 metadata: only point with the joint audio+text signal and no intent commitment. Derive once, every consumer sees same value. - §3.4 intent: only point with BOTH resolved intent identity AND free-text captures. Engine-agnostic enrichment surface. - §3.5 dialog: only point where response exists as final text and TTS has not committed to how it sounds. - §3.6 TTS: only point with the synthesized waveform — audio-domain mutations have nowhere else to live. B. Cancellation is exclusively a transformer plugin contract -------------------------------------------------------------- §8 reframed: cancellation is signalled only by a transformer setting the in-band `cancel` context flag. There is no `ovos.utterance.cancel` bus event a third party can emit; the orchestrator owns the cancellation machinery and exposes only the plugin contract. Deployments that want out-of-band cancellation (hardware buttons, peer signals, channel barge-in) ship a thin transformer that watches for the trigger and sets the §8.1 flag. This consolidates §8 from two surfaces to one, removes the §8.4 race section (no out-of-band path = no async race), and makes the spec's surface area for cancellation a single plugin contract. New §10 non-goal: explicit "out-of-band cancellation channels" are out of scope by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tors Reframes the metadata-transformer chain so its input and output is explicitly the full Message.context (OVOS-MSG-1 §2.3) — not a sub-context object — and so its permitted mutations are unrestricted across that surface. Key changes to §3.3: - Input: full Message.context including session carrier (§4 of MSG-1), session.context (CONTEXT-1 §2), session.pipeline (PIPELINE-1 §5), the six per-session transformer overrides (§5 of this spec), routing keys (§3 of MSG-1), and any other top-level keys other specs or earlier transformers have written. - Permitted mutations: a metadata transformer MAY mutate Message.context however it sees fit. Replaces the previous "MAY add/update keys; SHOULD NOT remove keys it did not set; MUST NOT modify reserved keys" with a single permissive rule: the chain is the deployer's in-process Message.context manipulation surface; loading a transformer is the deployer's authorization. - Coordination guidance (SHOULD-level): notes the consequences of mutating each companion-spec field — session.context bypasses CONTEXT-1 §5 stamping; session.pipeline reroutes this utterance; session.lang affects downstream localization; source/destination affects routing for derived messages. - New canonical use cases reflecting the expanded purview: per-utterance language override (write to session.lang), per-utterance pipeline switch (replace session.pipeline for one turn), system context injection (write session.context entries bypassing CONTEXT-1's bus events when provenance is not needed). - LLM-fit note expanded: LLMs as per-utterance pipeline routers, not only as metadata classifiers. Conceptually: §3.3 is now the spec's "anything-goes Message.context operator" — distinct from §3.4 intent transformers (Match operators) and from CONTEXT-1 §5 bus events (skill-side, provenance-stamped). Three different mutation surfaces for three different deployment intents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stress test Reading ovos-bidirectional-translation-plugin/__init__.py end-to-end exposed three ambiguities in the spec wording that the plugin works around informally. All three are spec-side fixes (the plugin's behaviour is reasonable and should be conformant under a tighter reading). §3.2 / §3.5 / §3.6 — input clarified as full Message.context ------------------------------------------------------------- The plugin mutates session.lang from inside both its utterance transformer (§3.2) and its dialog transformer (§3.5) halves. The prior §3.2/§3.5 wording said "MAY merge keys into the context object" — ambiguous about whether session-internal mutations qualify. §3.3's recent reframe explicitly empowered metadata transformers for full Message.context mutation; the spec read as if §3.2/§3.5/§3.6 had weaker rights. Clarified across §3.2, §3.5, §3.6: the `context` argument is the full `Message.context` (same surface §3.3 covers), and §3.3's permissive mutation rules apply uniformly. §3.3's distinguishing trait is that it has no primary artifact input — context is its only working surface — not that it has special mutation rights other transformer types lack. §7 — mid-lifecycle session-mutation propagation made explicit ------------------------------------------------------------- The plugin's bidirectional pattern depends on an unwritten invariant: when a transformer mutates session.lang and re-serializes session into context["session"], every downstream stage reads the mutated value rather than a cached pre-mutation copy. The spec implied this through OVOS-MSG-1 §5's forward/reply semantics but never stated it. New §7 clause makes the propagation rule explicit and tells downstream consumers they MUST read live session values from in-flight Message.context. §7 — cross-transformer coordination via namespaced context keys --------------------------------------------------------------- The plugin uses bare top-level context keys (`was_translated`, `output_lang`, `translate_dialogs`) to coordinate between its two halves. Works here because both halves are the same plugin sharing a convention; two unrelated plugins picking the same bare key would collide. §10 non-goal already says cross-transformer coordination protocols are out of scope — but plugins are doing it via convention regardless. New §7 clause: transformers SHOULD namespace ad-hoc coordination keys with their transformer_id as a prefix (e.g. `ovos-utterance-translation-plugin.output_lang` instead of bare `output_lang`). The spec defines no central registry; namespacing is what makes that absence safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onvention
Stress-tested §8 against two in-tree utterance transformers that
already implement cancellation today:
- ovos-utterance-plugin-cancel (NevermindPlugin) — matches locale
cancel words, returns [], {"canceled": True, "cancel_word": ...}
- ovos-transcription-validator-plugin (TranscriptionValidatorPlugin) —
LLM-based STT validation, returns [], {"canceled": True,
"cancel_word": "[MISTRANSCRIPTION]"} when LLM rejects the
transcription
Both use the same de facto convention: empty list as utterance
output, with `canceled: true` and `cancel_word: <reason>` as
top-level context keys. The previous §8.1 draft proposed a
different nested shape (`cancel: {by, reason}`). Adopted the
existing convention into the spec; the principle is the same one
that drove §3.3's reframe — trust real prior plugin design rather
than imposing a new shape.
§8.1 changes:
- Cancellation signal is two flat keys: `canceled: bool` and
`cancel_word: string`. Both MUST be present together when
signalling; one without the other is a §7 shape violation.
- Orchestrator stamps `cancel_by: <transformer_id>` automatically
on observing the signal (parallels OVOS-CONTEXT-1 §5.2's
origin-stamping rule — transformers can't impersonate each
other's cancellations).
- Explicit note that when `canceled: true` is observed alongside
an empty utterance list, the flag is the signal — the empty
list is convention, not the trigger.
§3.2 changes:
- Distinguishes empty list alone (no plausible transcription →
complete_intent_failure) from empty list + §8.1 keys
(cancellation → ovos.utterance.cancelled). Both NevermindPlugin
and TranscriptionValidatorPlugin are the cancellation case;
spec now spells out which terminal event each shape produces.
§8.2 and §9 conformance updated to use the new key names
(cancel_word + cancel_by instead of by/reason). The transformer
MUST bullet now says "set canceled: true and cancel_word in
returned context; orchestrator stamps cancel_by".
LLM stress-test note: TranscriptionValidatorPlugin's design (LLM
classifier called inside transform, reprompt-or-error fire-and-
forget bus emit) validates §9's "MAY access the bus for
side-effects, SHOULD NOT depend on synchronous bus responses".
Concrete real-world example of the spec's *Where LLMs fit* §3.2
note - direct prior art.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The §8.1 cancellation signal's "why" field is renamed cancel_word -> cancel_reason. cancel_word was incidental plugin metadata in ovos-utterance-plugin-cancel and ovos-transcription-validator- plugin (the actual cue / sentinel they happened to surface); the spec's normative field is the structured concept "why was this cancelled", which reads more cleanly as cancel_reason. Plugins MAY continue to set their own cancel_word (or any other top-level metadata) alongside the §8.1 keys — that's plugin- specific and outside the spec's contract. §8.1 calls this out explicitly and points at §7's namespacing guidance for ad-hoc keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s/dialog-normalizer audit Reading three more in-tree transformer plugins surfaced three small but real spec ambiguities. All three are spec-side fixes. ovos-utterance-normalizer (UtteranceNormalizerPlugin): Triples the utterance list per input — for each candidate emits expand_contractions(u), original u, normalize(u). Deduplicates. Proves §3.2 explicitly supports list expansion (already implicit in "MAY rewrite, expand, or contract"). ovos-utterance-corrections-plugin (UtteranceCorrectionsPlugin): XDG-stored full-utterance / regex / word replacement tables. Mutates utterances list IN PLACE (utterances[idx] = compiled_pattern.sub(...)). Step-1 full replace runs against utterances[0] only; steps 2-3 iterate the whole list. Surfaces the "utterances[0] is primary" implicit convention and the in-place vs new-list mutation ambiguity. ovos-dialog-normalizer-plugin (DialogNormalizerTransformer): Canonical example of §3.5's "why this injection point" rationale: "I'm Dr. Prof. 12345 €" → "I am Doctor Professor twelve thousand three hundred forty-five euros". This can't live in skills (which shouldn't know TTS handling of currency symbols) and can't live in TTS transformers (operating on audio bytes). Spec changes: §3.2 — utterances[0] formalized as primary candidate. Later indices are alternatives downstream matchers MAY try. Codifies the de facto convention seen across all five §3.2 plugins audited so far. §3.2 — in-place mutation MAY be performed, or a new list returned; both are conformant. ovos-utterance-corrections-plugin (in-place) and the others (new lists) currently exercise both shapes; spec now permits both explicitly. §7 — guidance on reading language from Message.context resolves a real cross-plugin inconsistency: - context["lang"] only (cancel, validator, normalizer) - sess.lang only (dialog normalizer) - both with fallbacks (translator, validator) session.lang is now spec'd as the canonical preferred-language signal; top-level context["lang"] (when present) is the per-utterance override populated from data.lang per OVOS-MSG-1 §4.2. Transformers MUST tolerate both shapes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olution Per the new "1 file per PR / self-contained / timeless" policy: - Reverts the APPENDIX.md / CHANGELOG.md / README.md edits this branch accumulated. Each of those belongs in its own dedicated PR (APPENDIX per the new "dedicated PR just for appendix" rule; CHANGELOG and README as their own scoped PRs). - Spec body is fully self-contained: no cross-references to APPENDIX for normative meaning, all per-§3.x rationales / canonical use cases / "where LLMs fit" notes stay in the spec body where readers can find them without leaving the file. - Adds §7.1 Language resolution and the disambiguation hierarchy: six-level precedence chain (stt_lang > request_lang > detected_lang > data.lang > existing session.lang > config default), valid_langs gating, deprecation of top-level Message.context["lang"], per- injection-point producer responsibilities. Resolved value is written to session.lang. Orchestrator MUST resolve before running the §3.2 utterance chain. Bumps §1.1 scope list and §9 orchestrator conformance to reference §7.1. PR net diff is now a single file: transformer.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframes the spec to explicitly accommodate the deployment shape
current OVOS already uses — orchestrator split along the audio
boundary into cooperating processes (audio-input service / utterance-
handling service / audio-output service). From the spec's
perspective these are all "the orchestrator"; the split is a
deployment / containerization choice the spec allows but does not
prescribe.
The substantive consequence: no single orchestrator process has a
global view of all loaded transformers. The audio-input service
doesn't know what dialog transformers the audio-output service
loaded, and vice versa. The introspection surface is designed
around this.
§1 — new framing paragraph: "The orchestrator is a logical role
and MAY be implemented as multiple cooperating processes." Names
the audio-input / utterance-handling / audio-output split current
OVOS uses as the canonical example. Notes the no-global-view
consequence. The transformer_id definition reflects the split:
each process holds its slice of the mapping; the union across
processes is the full loaded set.
§6 — rewritten as broadcast-query / scatter-response. No central
registration index; each orchestrator process answers
transformer.{type}.list and transformer.list with its own local
slice. Responses carry loaded + priorities only (no global chain
order — composition is the §4 priority + §5 override combined
across the union). Pull-only discovery; no register/deregister
handshake. Single-process deployments answer fully from one
reply; split deployments from several.
§9 — conformance: each orchestrator process that implements one
or more chains MUST meet the per-process introspection
obligations for the chains it implements. Composition of
per-process responses is the orchestrator's full view.
§10 — non-goal: cross-process invocation topic shape left open.
Spec defines introspection (§6) and IO contracts (§3); deployers
pick whatever cross-process request/response convention fits
their substrate.
§5 — per-session override resolution updated to operate "over the
set of transformers it can reach" (in-process or via another
orchestrator process).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rce of truth Refines §6's discovery model to better match async-bus reality: - Orchestrator processes MAY volunteer a one-shot load-time announcement (push) of their loaded set, on the same .response topic a pull query would land on. This is a convenience for consumers already listening (a monitoring service that came up before the orchestrator process). - Consumers MUST NOT rely on having received any prior announcement. Load ordering between producers and consumers is not guaranteed — a consumer that starts after the announcement fired has missed it, and the bus is async with no catch-up channel for missed broadcasts. - A consumer that needs accurate state MUST query. §9 conformance updated: - Per-process MAY clause for load-time announcements. - New "consumers" MUST clause: query when accuracy matters; do not assume announcements reached them. Replaces the previous strict "MUST NOT spontaneously broadcast" rule, which over-rotated against a useful convenience pattern (load-time announcements) that's harmless as long as consumers don't depend on them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raming
Two refinements:
1. Drop the aggregate `transformer.list` query. Only per-plugin-type
topics remain: transformer.audio.list, transformer.utterance.list,
transformer.metadata.list, transformer.intent.list,
transformer.dialog.list, transformer.tts.list. An aggregate would
imply a single responder with a global view, which this spec
doesn't assume exists. Consumers wanting multiple types issue
multiple queries.
2. Generic voice-OS framing instead of "OVOS" references:
- §1: "natural homes for this kind of work in a voice operating
system's utterance lifecycle" (was "the OVOS architecture
recognizes")
- §1 split-orchestrator paragraph: presents the audio-input /
utterance-handling / audio-output split as "a natural split
along the audio boundary" rather than "in current OVOS". The
pattern is the prescribed shape, not a description of one
implementation.
§6 + §9 updated for the no-aggregate change: per-type subscriptions
only, no aggregate topic in conformance bullets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, lang in SESSION-1 §1: split-orchestrator framing now references PIPELINE-1 §2 instead of restating; loses several paragraphs of duplicated narration. §3.3 metadata transformers: new orchestrator-stamped metadata_by audit trail (list of transformer_ids, in chain order, appended when context is actually modified). Parallel to cancel_by for the cancellation chain. Plugin MUST NOT write it itself. §5 per-session overrides: drop field restatement; reference SESSION-1 §2.1 claim and §2.5 deployment-default fallback. Tighten partial-unknown rule (orchestrator MUST NOT fall back to deployment default merely because one identifier is unknown). §7 error handling: MUST log → SHOULD log (logging is deployment concern; catch-and-proceed is the load-bearing contract). §7.1 lang resolution: full disambiguation hierarchy removed — moved to SESSION-1 §3.2 (where the lang signals are now claimed as session fields). Replaced with one short paragraph naming which transformer types are natural producers of which signals. Consolidation policy explicitly deferred to SESSION-1 §3.2.7. §8.1 cancellation: same MUST log → SHOULD log downgrade for shape- violation path. §9 conformance: add metadata_by stamping line; add cancellation handling checklist line (closes drift — §8's six MUSTs were not restated in conformance). See also: add SESSION-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ata_by SHOULD §6 introspection topics: rename transformer.<type>.list → ovos.transformer.<type>.list (and .response). Aligns with the ovos.<domain>.<verb> convention shared with INTENT-4 / PIPELINE-1. §5 per-session overrides: replace the slim "claims six fields" table with a full SESSION-1 §2.1 6-point registry table (wire type / propagation / scope / deployment-default per field). The gap SESSION-1 §2.1 was written to prevent — now closed. §3.3 metadata_by: downgrade orchestrator MUST-stamp to SHOULD. Detecting "modified context" without a heavyweight change- detection contract is impossible in general; best-effort traceability is honest. Consumers MUST treat as hint, not authoritative provenance. §9 conformance: corresponding MUST → SHOULD for metadata_by. §1 split-orchestrator: reduce to one-sentence PIPELINE-1 §2 reference + one paragraph naming the natural transformer-chain partitioning. No restatement of the audio-boundary framing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…y warnings metadata_by was speculative and could not be reliably enforced (orchestrator cannot detect in-place mutations without a heavyweight change-detection contract the spec deliberately did not define). Even as a hint, it duplicated information already deterministically available via §6 introspection + chain order. Removed entirely: - §3.3 audit-trail prose - §3.3 best-effort / consumers-treat-as-hint paragraph - §3.3 cancel_by-analogue framing - §9 conformance bullet Replaced with one sentence pointing readers at §6 + chain order as the deterministic attribution path. Also reframes the reserved-key guidance: drop the empty "SHOULD coordinate" wrapper (which contradicted the unlimited- mutation permission three paragraphs up); keep the bullet list of consequence-bearing keys as a deliberate-not-blindly note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… §5.3 Two cross-spec audit fixes: - §1.2 (new): transformers MUST stamp Message.context['transformer_id'] on every bus emission, parallel to OVOS-INTENT-4 §3.1's context['skill_id'] and OVOS-PIPELINE-1 §3.1's context['pipeline_id'] rules. Reserved-key precedence: the three identity keys are mutually exclusive on a single Message; consumers observing more than one SHOULD treat as malformed. Includes the emitter vs subject distinction (context['transformer_id'] vs data['transformer_id']) and orchestrator-side loader enforcement. Downstream specs (CONTEXT-1 §5.2 attribution of transformer-emitted ovos.context.set) now have a wire-level source. - §9 conformance: previously said transformers SHOULD NOT mutate session.intent_context directly, with a carve-out only for intent transformers. That contradicted CONTEXT-1 §5.3, which normatively permits any transformer holding an in-flight session to mutate intent_context directly. Rewrote the MAY bullet to permit direct mutation for any transformer type, pointing at CONTEXT-1 §3 / §5.3 for the key-shape rules (private entries prefixed by transformer_id, or skill_id when writing on behalf of a specific skill). Also added the §1.2 emission rule to the bus-side-effects bullet. - See also: tightened the CONTEXT-1 entry — direct mutation is no longer characterised as 'should not bypass'; intent_context is no longer 'read-mostly' for transformers. The choice between bus events and direct mutation is the transformer's per §9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…coexist Same correction as PIPELINE-1 §3.1: forward/reply preserve upstream identity stamps, and a transformer running over a chain that already carries skill_id and/or pipeline_id additionally stamps transformer_id without stripping the others. The three keys coexist along the derivation chain; attribution consumers pick one via the most-specific-wins precedence codified in CONTEXT-1 §5.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation keys Two structural changes settled in design discussion: 1. §1.1 (new): formal transformer-identity definition. Identity is the pair (type, transformer_id). Type is one of six (audio, utterance, metadata, intent, dialog, tts) and fixes the injection point. transformer_id is opaque, unique within its type's registry. Multi-type plugins MAY share a transformer_id across types — types are independent registries. Wire-encoding of the pair: type via context-key choice, transformer_id via the value. 2. §1.3 (replacing the old §1.2 self-identification): the spec now claims SIX context keys, one per transformer type (audio_transformer_id, utterance_transformer_id, metadata_transformer_id, intent_transformer_id, dialog_transformer_id, tts_transformer_id) instead of one generic transformer_id. Rationale: preserves role across the six-stage chain, disambiguates multi-type plugins, mirrors SESSION-1's per-type *_transformers partitioning. The MUST-stamp rule now spans both origination and modify-in-place (transformers' common mode is mutating the message they were handed; that act binds the stamp obligation just as bus emission does). Overwrite-last applies within a single type's chain; across types, the six keys coexist. Identity keys from other component-types (skill_id, pipeline_id) coexist with the transformer keys — none stripped by derivation. Old §1.1 Scope is now §1.2 (Scope is unchanged in content). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce + derivations stamp Two structural refinements: 1. The six self-identification context keys move from single-string to list-valued, renamed plural: audio_transformer_ids, utterance_transformer_ids, metadata_transformer_ids, intent_transformer_ids, dialog_transformer_ids, tts_transformer_ids. Each holds an ordered list of transformer_ids that touched the Message, preserving full chain provenance per type on the wire (where previously overwrite-last lost earlier-chain entries). Rationale: chains are the defining transformer shape; the wire should carry the history. skill_id and pipeline_id stay as single strings — they originate, they don't chain. The 'ensure self is last element' formulation makes the stamp rule self-idempotent and handles origination, modify-in-place, and re-entry uniformly. 2. The stamp rule applies to derivations (Message.forward / .reply / .response) when the transformer is the component performing the derivation and placing the resulting Message on the bus. The derivation mechanism is irrelevant — what matters is that the transformer caused a Message to appear on the bus. §1.1 wire-encoding note and the cancel_by stamping reference in §9 conformance updated to plural names. The singular <type>_transformer_id naming is now explicitly not used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Existing wording said producers 'MUST NOT populate ... values that match the deployment default'. SESSION-1 §3.4 establishes the canonical rule as SHOULD (not MUST) and lists [] as wire-equivalent to omission. Aligned: now points at §3.4 (corrected reference, previously §3.3 before renumbering) and uses SHOULD. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion Completing the blacklist family for symmetry with PIPELINE-1 §5: PIPELINE-1 owns blacklisted_skills / blacklisted_intents / blacklisted_pipelines; TRANSFORM-1 now owns the per-type transformer denylists, one per injection point. §5 restructured into three subsections: - §5.1 Per-type chain ordering (existing *_transformers, the preference channel — unchanged in content, framed as preference). - §5.2 Per-type denylists (new): blacklisted_audio_transformers, blacklisted_utterance_transformers, blacklisted_metadata_transformers, blacklisted_intent_transformers, blacklisted_dialog_transformers, blacklisted_tts_transformers. Orchestrator-only single-tier filter (no two-tier backstop like skills/intents, because transformers don't return match candidates — orchestrator drives the chain directly). Policy overrides preference. - §5.3 Composition mirrors PIPELINE-1 §5.5 three-stage layering: preference (from §5.1 or deployer default) → availability (drop unloaded) → policy (drop denylisted). Per injection point. Empty effective chain = no transformers run at that stage, artifact passes through. Layer-2 substrate authorization framing parallels PIPELINE-1 §5.6. All twelve session fields follow the SHOULD-omit / []-equivalent- to-omission rule of OVOS-SESSION-1 §3.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A transformer that .forward's a Message it did not modify MUST NOT append its own transformer_id for that derivation — the inherited <type>_transformer_ids list rides through untouched, preserving upstream chain provenance. Modify-in-place still binds the stamp obligation; .reply and .response are authorial and MUST stamp. Symmetric with the analogous rules in PIPELINE-1 §3.1 (pipeline_id) and INTENT-4 §3.1 (skill_id). Consistent forward-is-propagation semantics across all three component-identity surfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'The most common LLM hook today' is temporal meta-commentary. Replace with 'A natural injection point for language models', which is a timeless characterization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the long-standing footgun where transformers had no direct access to the utterance/dialog/TTS content language and were reaching for session.lang (user preference, not content) or relying on the orchestrator copying data.lang into context.lang as a workaround. New §3.0: common contract for the lang parameter across the three text-bearing chains (§3.2 utterance, §3.5 dialog, §3.6 TTS). Orchestrator sources lang from Message.data.lang; passes it through when present; MUST NOT synthesize from session.lang or other signals when absent. Consumer (transformer) decides how to resolve absence per its own policy. Audio (§3.1), metadata (§3.3), and intent (§3.4) transformers do not receive the parameter — audio is pre-STT, metadata is context-only with no artifact, intent has Match.lang by construction. Per-type Input sections updated for §3.2, §3.5, §3.6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audio transformers can legitimately receive lang when the producer authoritatively knows it — a UI language selector, an upstream audio language-detector plugin, a device-configured language, a test fixture, or an STT decoder run with a fixed language hypothesis. The spec makes no claim about source; presence alone is authoritative. Updated §3.0 to list four chains (audio, utterance, dialog, TTS) that receive lang. Metadata (§3.3) and intent (§3.4) remain excluded for their existing reasons (context-only artifact; Match.lang already authoritative). §3.1 audio transformer Input updated to include lang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s back to data.lang A bidirectional-translation transformer (and any other transformer that mutates the artifact's language — language-detectors that set lang where it was None, transformations that obscure the prior tag) needs to communicate the post-transformation language to downstream stages. Making lang only an input meant data.lang diverged from the artifact's actual language after translation. §3.0 now specifies bidirectional propagation: - The orchestrator passes lang in on each transformer call. - Each transformer returns a possibly-mutated lang alongside its modified artifact and context. - The orchestrator threads (artifact, lang) into the next transformer in the chain. - At chain end, the orchestrator MUST writeback the final lang value to Message.data.lang on the relevant Message. Setting data.lang when non-None; unsetting it when None and the field was previously present. The chain's conclusion is authoritative for downstream stages. The orchestrator still MUST NOT synthesize lang from session.lang or other signals — only artifact/transformer flow. §3.1 (audio), §3.2 (utterance), §3.5 (dialog), §3.6 (TTS) Output sections updated to list lang explicitly alongside the artifact and context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ncel_reason vocabulary; drop §1.3 idempotency clause Three audit findings actioned: - §3.3 metadata transformer routing-key mutation: previously 'MAY mutate however it sees fit' with a mild 'consequences worth being deliberate about' disclaimer. Tightened to SHOULD NOT mutate source/destination unless the transformer's deliberate role is re-routing; transformers that do MUST understand the MSG-1 §5 derivation consequences. - §8.1 cancel_reason vocabulary: was free-form by default, making observability/audit brittle. Mint five reserved values (stop_word, transcription_invalid, policy_block, parental_control, other) for the common cases; transformers SHOULD use one. Free-form remains conformant, deployers are encouraged to coordinate. other is the universal fallback for transformers that don't want to think about vocabulary. - §1.3 list-append rule: drop the 'ends with self -> no-op' idempotency clause. The intended invariant is 'list records every touch including re-entry', preserving chain provenance verbatim. The no-op-on-re-entry rule destroyed exactly the signal the list exists to capture. Consumers that want to collapse runs MAY do so at read time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CONVERSE-1 §3.4 (PR #25) explicitly cites the metadata-transformer hook as the recommended position for mutating session.active_handlers and session.response_mode. Add this to the §3.3 list of permitted mutations, with the §5.4 cancellation-semantics back-reference for mid-wait holder changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "on cancellation per §8 — stop the current chain..." bullet was a near-verbatim duplicate of the preceding bullet. Merged the two into one complete statement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8ac8212 to
270c465
Compare
JarbasAl
added a commit
that referenced
this pull request
May 26, 2026
Per the new "dedicated APPENDIX PR" policy, consolidating the prior-art and design-deviation notes from the OVOS-CONTEXT-1 (PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR. Those spec PRs are now scoped to their own spec files only; the discussion / cross-spec touchups / in-tree prior art all live here. Adds to §4 Design rationale: - "Intent context (CONTEXT-1)" — the Adapt-only origins, the two-scope (private/shared) formalization, jurebes / nebulento / palavreado as prior art for excludes_context, the engine-side §5.3 mutation pathway resolving the PIPELINE-1 §4.2 contradiction. - "Transformer plugins (TRANSFORM-1)" — the architectural- pattern framing, intent transformers as the system-typing home, the nine concrete in-tree plugins as prior art, the ascending-vs-descending priority deviation called out, cancellation alignment with existing plugin convention, and the language disambiguation hierarchy mirroring current ovos-core code paths. Removes from §7 Known gaps: - "Intent context" bullet (formalized in CONTEXT-1). - "The utterance-transformer chain" bullet (formalized in TRANSFORM-1).
JarbasAl
added a commit
that referenced
this pull request
May 26, 2026
* docs: README — full spec-set refresh for the in-flight stack Update the README to reflect the full spec set landing together: the original intent stack (INTENT-1/-2/-3, MSG-1) plus the in-flight specs (INTENT-4, SESSION-1, SESSION-2, PIPELINE-1, TRANSFORM-1, CONTEXT-1, CONVERSE-1). Changes: - Specification table reorganised into three stacks — intent, bus, orchestrator — each with a one-paragraph narrative. This is the structure APPENDIX §1.2 already uses; the README now mirrors it for consistency. - New 'Where to start' section with four reading-order paths matching common audiences: skill author, plugin author, orchestrator author, architecture surveyor. Addresses the 'no clear entry point' friction first-time readers hit when the set went from 4 to 11 specs. - New 'How this compares to other voice frameworks' section summarising APPENDIX §2's positioning (Home Assistant / hassil, Rasa, Alexa / Dialogflow, Rhasspy / Hermes, Wyoming). Brief; points at APPENDIX for detail. - Reference-implementation section split: ovos-spec-tools covers the intent stack; bus and orchestrator stacks are acknowledged as not-yet-having-ground-up-reference-impl with pointer to APPENDIX §5 divergence catalogue. - New 'Implementation status' section: clarifies the spec-set Draft→stable transition is tracked at #5; intent stack is most aligned with current ovos-core; known gaps cited from APPENDIX §7. - Contributing section adds the one-file-per-PR rule (per AGENTS.md repo convention) and clarifies dev vs master targeting. - Updated draft warning to reference APPENDIX §5 divergence catalogue and link to #5. No normative-spec changes; README and supporting-metadata only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * README: establish voice OS framing; add OS-analogy table Replace "voice assistant ecosystem" opening with "voice operating system" framing. Add "What a voice operating system is" section with OS-analogy table (scheduler, IPC, shared memory, process supervision, loadable modules, syscall ABI) and the portability consequence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * README: full spec table — three stacks, open PR links Split specs into intent / bus / orchestrator stacks. Add all 11 specs including in-review ones (INTENT-4 #9, INTENT-2 v3 #4, TRANSFORM-1 #20, CONTEXT-1 #18, CONVERSE-1 #25). Add role-based reading order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JarbasAl
added a commit
that referenced
this pull request
May 26, 2026
… PIPELINE-1 (#14) * docs: APPENDIX — audit-driven corrections (pipeline + registration model) Applies corrections found by auditing claims against actual OVOS source code: 1. **§6.7 enable/disable_intent legacy names corrected** to the real `mycroft.skill.enable_intent` / `mycroft.skill.disable_intent`. 2. **§6.4 direct-bus-subscribe claim broadened** — verified the standard ovos-padatious-pipeline-plugin and ovos-adapt-pipeline-plugin both subscribe directly to registration topics, not just downstream plugins. 3. **§6.4 "side-effects during match" softened** — audit confirms the official match_* methods are already side-effect-free; the skill-activation emit is orchestrator-side, not plugin-side. Rule reframed as forward-looking discipline. 4. **§3 / §4 / §6.4: PIPELINE-1 *refines* the plugin model rather than *introducing* it.** OVOSPipelineFactory, pipeline_plugins dict, _PIPELINE_MIGRATION_MAP, and the official plugin set already exist. PIPELINE-1's actual contribution narrows to: formalizing the contract, `<owner_id>:<intent_name>` polymorphism, universal `ovos.utterance.handled` end-marker, and the renames. 5. **§3 / §4 / §6.4: tier convention is compatible, not a divergence.** From the bus each tier is already a distinct `pipeline_id` in `Session.pipeline`. How a Python plugin class internally serves multiple `pipeline_id`s (one class with match_high/medium/low methods, an orchestrator-side suffix-decoder, separate plugin instances, etc.) is implementation choice the spec does not constrain. 6. **§4 / §6.4: registrations-are-broadcast is compatible, not a divergence.** OVOS already broadcasts registrations on the bus; plugins already subscribe directly. INTENT-4 does not change this — it only renames topics into the `ovos.intent.*` namespace (see §6.7). Migration is a string replacement. What IS new is the orchestrator's passive registration index that backs `ovos.intent.list` / `.describe` — that's added as a separate §6.4 divergence ("new orchestrator responsibility, not a change to existing behaviour"). 7. **§6.6 adds note on engine-specific introspection topics** (`intent.service.adapt.*`, `intent.service.padatious.get`) — plugin-defined surface; spec does not claim authority over them. No spec-body changes; APPENDIX only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: APPENDIX §6.4 — drop the "dissolution" divergence Same logic as the broadcast-registrations correction: the orchestrator already treats every loaded plugin uniformly, and `IntentHandlerMatch.match_type` is an opaque string the plugin chooses — nothing in current code prevents a plugin from setting `match_type = "<pipeline_id>:<intent_name>"` and being dispatched to itself. The `<owner_id>:<intent_name>` polymorphism PIPELINE-1 names is therefore already supported; the spec only writes down a convention current code allows but does not document. Design rationale around the polymorphism stays in §3/§4 — it is useful explicit naming. But it is not a divergence and should not sit in the divergence catalogue. §6.4 now contains a single real divergence: the orchestrator's new passive registration index backing `ovos.intent.list` / `.describe`. Everything else in §6.4 is forward-looking discipline or a workshop bug, not an architectural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: keep session.pipeline (revert the rename row) PIPELINE-1 now keeps the existing `session.pipeline` field name instead of renaming it to `pipeline_stages`. Drop the §6.2 rename row and revert the prose mentions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §7: note utterance-transformer chain as a deferred spec (out of scope for PIPELINE-1) * APPENDIX §4 / §7: design notes for OVOS-CONTEXT-1 and OVOS-TRANSFORM-1 Per the new "dedicated APPENDIX PR" policy, consolidating the prior-art and design-deviation notes from the OVOS-CONTEXT-1 (PR #18) and OVOS-TRANSFORM-1 (PR #20) work into this PR. Those spec PRs are now scoped to their own spec files only; the discussion / cross-spec touchups / in-tree prior art all live here. Adds to §4 Design rationale: - "Intent context (CONTEXT-1)" — the Adapt-only origins, the two-scope (private/shared) formalization, jurebes / nebulento / palavreado as prior art for excludes_context, the engine-side §5.3 mutation pathway resolving the PIPELINE-1 §4.2 contradiction. - "Transformer plugins (TRANSFORM-1)" — the architectural- pattern framing, intent transformers as the system-typing home, the nine concrete in-tree plugins as prior art, the ascending-vs-descending priority deviation called out, cancellation alignment with existing plugin convention, and the language disambiguation hierarchy mirroring current ovos-core code paths. Removes from §7 Known gaps: - "Intent context" bullet (formalized in CONTEXT-1). - "The utterance-transformer chain" bullet (formalized in TRANSFORM-1). * APPENDIX: SESSION-1 rationale; introspection patterns; revised divergences §4 — new 'Session (SESSION-1)' rationale subsection: why it exists, prescriptive-not-descriptive scope, omission-as-deferral semantics, four language signals. §4 'Transformer plugins' — language-disambiguation note updated: hierarchy moved out of TRANSFORM-1 to SESSION-1 §3.2; transformer types now just named as natural producers of signals, consolidation is consumer's stage-dependent choice. §6.4 architectural divergences — add: handler-trio ownership shifted to orchestrator (third-party handler code carries no obligation); per-pipeline_id intent introspection (PIPELINE-1 §10); CONTEXT-1 scope discriminator. Update ovos.utterance.handled note to reflect the trio-ownership shift (workshop fix is now in the wrapper, not the handler). §6.5.1 (new) — introspection-patterns table comparing INTENT-4, PIPELINE-1, CONTEXT-1, TRANSFORM-1 surfaces. Three shared properties (pull-query is source of truth, no completeness signal, per-process slices under split orchestrators). Notes naming-convention inconsistency as candidate follow-up. §6.6 — remove obsolete 'session shape deferred' note; replace with SESSION-1 ownership statement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: update §6.5.1 topic-naming (resolved); add new §6.4 divergences §6.5.1: topic-naming inconsistency is now resolved — all four .list surfaces use ovos.<domain>.<verb>. Update the table and replace the 'not yet uniform' note with a rename log. §6.4: add four new divergence entries: - Skill self-identification on every emission (INTENT-4 §3.1) - recognizer_loop:utterance de-prescribed (PIPELINE-1 §9.1) - .list topics standardized - (keeps the existing scope-discriminator / handler-trio / per-pipeline_id / utterance.handled entries) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: cleanup — drop draft-history meta-commentary Stand-alone design notes, not a changelog. §4 design rationale: rewrite Session block and TRANSFORM-1 lang bullet to describe current design, not 'moved from earlier draft'. §6.4 divergences: rewrite handler-trio / trio-ownership / scope- discriminator / skill_id-emission / recognizer_loop / topic-naming entries to state current design, not contrast with earlier drafts. §6.5.1 introspection patterns: drop 'in this round' rename note. §9 (rewritten 'Design history' → 'The spec set, in three stacks'): drop §9.3 audit-driven-refinement entirely (changelog content); merge §9.1 + §9.2 into one tighter section about how the eight specs partition and what reference implementations exist. §10 compatibility levels: soften 'was previously spoken of at' to 'is spoken of at'; replace the 'no longer describes' framing with a forward-looking 'tuple covering all eight specs is a planned follow-up'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: update divergence catalog for CONTEXT-1 key-shape collapse + dispatch stamping §6.4: rewrite the CONTEXT-1 scope-discriminator entry to reflect the bigger change — scope AND origin both collapsed into the key shape. requires_context discriminator is the surviving surface (default private). §6.4: rewrite the skill_id-on-every-emission entry to lead with the structural enforcement (dispatch stamping + forward/reply inheritance), with loader interception as a follow-up rather than the primary path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: clarify topic-naming claim as prefix-uniform, verb depth varies * APPENDIX §6.5.1: flag the 'intent' word collision across three introspection topics Cross-spec audit B1: 'intent' plays three different roles across the four-spec introspection table — registered intents (INTENT-4), compiled-in-a-matcher intents (PIPELINE-1), and intent-transformer plugins (TRANSFORM-1). The shapes are deliberate and the payloads are distinct, but the topic strings read confusingly at a glance. Added an informative paragraph naming the three meanings and clarifying that ovos.transformer.intent.list follows the per-chain ovos.transformer.<type>.list pattern, where 'intent' is the chain type — not a listing of intents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §4 Transformer: design note on the six per-type self-identification keys Document the rationale for TRANSFORM-1 §1.3 claiming six per-type context keys (audio_transformer_id, utterance_transformer_id, ...) rather than a single generic transformer_id. Two arguments: (1) role preservation across the six-stage chain, mirroring the per-type partition that already exists in §1.1 registries, §5 session overrides, and §6 introspection topics; (2) multi-type- plugin disambiguation, since §1.1 permits a single transformer_id across types and a generic context key would erase the role at emit time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §4 Transformer: record list-valued attribution, denylist symmetry, and the per-type field-count tradeoff Four design notes capturing the recent TRANSFORM-1 evolution: - Update the existing per-type self-id bullet to reflect the plural list-valued context keys (audio_transformer_ids etc., not the older singular names). - New bullet: list-valued attribution preserves full chain provenance per type; the last entry is the most-recent stamp. Skills and pipelines stay single-string because they originate rather than chain. - New bullet: per-type denylists (six blacklisted_*_transformers) complete the policy surface, mirroring PIPELINE-1's pipeline/blacklisted_pipelines pair. Three-stage composition (preference → availability → policy) parallels PIPELINE-1 §5.5. - New bullet: acknowledge the per-type 'explosion' (12 session fields + 6 context keys), defend the choice against the transformer_<type>:<name> prefix-encoding alternative (direct lookup vs prefix parsing), note that SHOULD-omit makes the common case zero-cost on the wire, and document the object-valued form as a clean fallback if the field count ever proves painful in practice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §4 CONTEXT-1: rationale for default-private scope Add design-rationale paragraph explaining why ovos.context.set defaults to private scope when the canonical worked example (Person → Bob) is naturally cross-skill. Three reasons: migration fidelity (current Adapt set_context is effectively skill-private), safer footgun direction (accidental shared-leak is harder to debug than accidental cross-skill miss), and authorability (cross-skill coordination deserves a conscious explicit scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §6: record recognizer_loop:utterance -> ovos.utterance.handle rename Move the entry-topic from §6.1 'already aligned' to §6.4 'architectural divergences' — it is no longer a name kept verbatim, since PIPELINE-1 §9.1 now prescribes ovos.utterance.handle. Rationale paragraph cites the three MSG-1 §2.1.2 naming convention violations: ':' as separator, implementation-role leading segment, missing request/terminal verb pairing. Migration cost spelled out (every audio-input service emits, every intent-service handler subscribes: ovos-dinkum-listener, ovos-simple-listener, ovos-audio, ovos-core/intent_services). §6.7 predecessor-topic table updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: §2.5 Rasa/hassil/ASK/Mycroft comparisons; §6.5.2 session-field + stamp-rule cheat-sheets Two informative additions: - §2.5 (new): extends the §2 comparison set with Rasa, hassil, ASK / Dialogflow, and Mycroft. Locates the CONTEXT-1 design against Rasa's policy-engine-coupled forms; locates TRANSFORM-1 §3.4 against ASK/Dialogflow built-in entity types as the injectable open contract; documents Mycroft as the predecessor whose ad-hoc model the spec family formalizes. - §6.5.2 (new): session-field cheat-sheet consolidating the 26 fields claimed across SESSION-1, PIPELINE-1, TRANSFORM-1, and CONTEXT-1 into a single reference table — owner spec, role (preference / policy / signal / identity), empty-array semantics. Followed by a stamp-rule cheat-sheet covering the three component-identity context-key surfaces (skill_id, pipeline_id, <type>_transformer_ids) and their behaviour across origination, .reply / .response, and .forward. Both reduce cross-spec bouncing for implementers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: reorganize from 10 sections to 7, restructure for flow The appendix had become a dumping ground after multiple rounds of additions. Restructured with clear narrative flow: §1 About the OVOS specifications — formalization framing, the three-stack overview (was §9), compatibility levels (was §10), reference implementations + ecosystem tooling (folds in ovos-spec-tools from §9 and ovos-localize from §8). §2 Comparison with other voice-assistant systems — merges the HA/Rhasspy material (was §2) with the Rasa/ASK/ Dialogflow/Mycroft/hassil material (was §2.5) into a single comparator section, ordered by relevance: HA & Rhasspy (shared lineage) → open-vs-closed structural argument → Mycroft (predecessor) → Rasa (CONTEXT-1 comparator) → ASK/Dialogflow → hassil (grammar-only) → summary of where OVOS leads/follows/differs. §3 Architectural patterns — the bus as substrate (was §5) and the pipeline-plugin model (was §3) grouped as the two cross-cutting architectural moves. Bus-substrate section gains an explicit subsection on the layer-2 authorization story (preference / policy split). §4 Design rationale, per specification — was §4 itself but now systematically per-spec (INTENT-1+2+3 grouped, MSG-1, SESSION-1, INTENT-4, PIPELINE-1, CONTEXT-1, TRANSFORM-1). Stale references purged; recently added rationales (most-specific-wins precedence, bidirectional lang propagation, per-type denylists, etc.) folded in. §5 Where the specs differ from current OVOS code — was §6 but reorganized: removed the §6.5.1 introspection- patterns table and §6.5.2 cheat-sheets (they aren't divergences from code, they're implementer reference — moved to §6). Renumbered to §5.1–§5.7. §6 Implementer reference — new top-level section gathering the cross-spec reference tables that were scattered: topic-name conventions (with the 'intent' overload clarification), session-field cheat-sheet, component-identity stamp-rule cheat-sheet, introspection patterns table. These don't belong inside a 'divergences from code' section; they're how-to material for fresh implementers. §7 Known gaps and planned work — unchanged content, last section. Trimmed stale entries about CONTEXT-1 and TRANSFORM-1 as 'planned' (they've shipped); added conversation-level evaluation infrastructure as a gap. Net: same content, far more navigable. Cross-references updated throughout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §2: drop Mycroft comparator subsection; renumber 2.4-2.7 to 2.3-2.6 Mycroft AI Inc shut down in 2023; the fork is years old and the intervening design is not Mycroft's. Keeping a 'comparison to predecessor' subsection over-attributes the architecture and mis-frames OVOS as a derivative project rather than a long- running open project in its own right. Section §2 is now a comparison with currently-relevant voice-assistant systems only: - §2.1 Home Assistant and Rhasspy (shared grammar lineage) - §2.2 Closed domain vs open ecosystem - §2.3 Rasa - §2.4 Amazon ASK / Google Dialogflow - §2.5 hassil - §2.6 Summary Collateral: dropped Mycroft from the project-name list in the intro and from the comparator enumeration in the §2.6 summary. Legacy topic strings that happen to contain 'mycroft' in their literal name remain in the §5 divergence tables and §5.7 predecessor-topic mapping as factual code references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §3.3: external-protocol interoperability injection points Make the family's interop story explicit rather than implied. New §3.3 catalogues three injection points where external protocols plug into the spec family: 1. Pipeline plugins as the dispatch-layer adapter — LLM APIs (OpenAI Chat Completions and compatible), deterministic template matchers (hassil), external intent classifiers, agent-tool protocols (MCP). 2. Transformer chains as the artifact-pipeline adapter — bidirectional translation, STT validators, content-policy filters, acoustic-event detectors. 3. Bus boundary as the wire-level adapter — Wyoming bridges, MQTT-based stacks, HiveMind-style layer-2 substrates. Per-protocol notes for Wyoming, OpenAI, MCP, hassil, MQTT, A2A — naming where each plugs in. The single-flip routing and no-central-state stance (§3.1) are what make the bus-boundary adapter feasible without modifying the assistant core. Concrete suggestion: a translation tool between OVOS-INTENT-2 locale resources and HA's hassil/intents YAML would let the two corpora cross-pollinate mechanically. Added to §7 known gaps as planned tooling. The three injection points are intentionally not exhaustive — they're the points the spec family deliberately keeps clean. A protocol needing deeper integration is a signal of architectural overlap rather than complementarity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX: add CONVERSE-1 to orchestrator-stack narrative; close multi-turn gap OVOS-CONVERSE-1 (PR #25) fills the multi-turn conversation gap that §7 previously listed as planned work. Update §1.2 stack description to include it, and drop the §7 gap entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §5.3, §5.4: update for PIPELINE-1 §4.2 relaxation + §7.0 polymorphism collapse Two divergence-catalogue entries updated to reflect the PIPELINE-1 restructure: - The §5.4 'side-effect-free during match' entry is rewritten as 'match contract is the single obligation' — match's only MUST is returning Match-or-null; bus emissions during match are allowed; session mutation during match is via Match.updated_session (explicit channel). - New §5.4 entry: 'Match.updated_session as the match-phase session channel' — promotes the existing ovos-core code pattern `sess = match.updated_session or SessionManager.get(message)` to a normative Match field. Claiming plugin's mutations land; declined plugin's mutations drop at the boundary. - The §5.3 'Dispatch payload uses polymorphic owner_id' entry is rewritten as 'unified owner_id' — reflects PIPELINE-1 §7.0's collapse to two handler-owner shapes (plain skill, pipeline plugin with bundled handlers where pipeline_id == skill_id) plus the pure-matcher recognition. Notes the conceptual mapping skill_id ≈ voice_app_id, pipeline_id ≈ matching-engine id. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §1.2, §7: SESSION-2 fills the lifecycle gap OVOS-SESSION-2 (in flight at PR #27) defines session lifecycle and state ownership. Update: - §1.2 orchestrator-stack narrative adds SESSION-2 to the stack description with one-line summary of its scope (stateless orchestrator for named sessions, orchestrator-owned default session, projection mandate). - §7 gap entry rewritten: SESSION-2 lands the lifecycle piece; what remains deferred is the set of session preference fields that need to be claimed under SESSION-1 §2.1 by their owning specs (preferences / OCP / persona / locale). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §1.2: SESSION-2 narrative — SHOULD-project + MAY-internal (not 'mandate') Sync with SESSION-2 §2.4 relaxation (commit 6a882c8). The projection pathway is SHOULD-when-practical; plugins MAY hold internal state with full lifecycle ownership and best-effort resumption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * APPENDIX §5.2.1: document ovos.session.sync / update_default for removal These ovos-core topics are not defined by any spec. SESSION-2 §6.4 explicitly avoids naming them. They should be retired in favour of clients reading session state from normal Message flow (ovos.utterance.handled or any other session-carrying Message). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * README + APPENDIX §1.0: establish voice OS framing README intro replaced: "voice assistant ecosystem" → "voice operating system" with an OS-analogy table (scheduler, IPC, shared memory, process supervision, loadable modules, syscall ABI). APPENDIX §1.0 (new): The voice operating system concept — two conflations addressed: (1) voice assistant product (closed, vertically integrated vs open platform); (2) LLM wrapper (LLMs fit as pipeline plugins, utterance/dialog/metadata transformers — one possible multi-role deployment, not the architecture itself). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * revert: move README voice-OS framing to its own PR (#28) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * APPENDIX: fix stale PIPELINE-1 refs; slim redundant prose - owner_id → skill_id throughout (§3.2, §3.3, §4.5, §4.6) - match(utterance,…) → match(utterances,…) (§4.5) - Match.captures → Match.slots (§4.7) - complete_intent_failure → ovos.intent.unmatched in §5.1/§5.3/§5.7; add rename row to §5.2 table - Dispatch payload block in §5.3 rewritten: {lang, utterance, slots}, handler-lifecycle uses {skill_id, intent_name, optional exception} - §5.5: add ovos.intent.unmatched and ovos.utterance.speak entries - §2.5 hassil: drop standalone subsection; fix §2 intro cross-ref - §1.3 compat levels: condense to bullets - §1.4: drop ovos-localize "honest notes" paragraph - §3.1.3: trim to essential bus-substrate mechanics - §4.7: trim per-type-explosion and per-type-self-id bullets - §5.4: trim rename and match-contract entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §3.3: OVOS-CONVERSE-1 §3.4 → §3.3 (metadata-transformer hook) - §3.3: CONVERSE-1 §5.4 → §5.3 (cancellation semantics) - §5.1: OVOS-SESSION-1 §2.5 → §2.1 (deployment-default rule)
…ale and use cases to appendix §4.7 - Move 'Why this injection point' rationale blocks (§3.1–§3.6) to appendix/rationale.md §4.7 - Move 'Canonical use cases' and 'Where LLMs fit' lists to appendix - Move 'Cross-cutting concerns' opening paragraph to appendix - Move cancellation use-case list (§8) to appendix - Move introspection aggregate-query rationale and consumer list to appendix - Collapse §3.0 lang parameter redundant bullets into compact forms - Collapse §5.1 'propagates unchanged' column to prose footnote-style - Add [Informative] heading to §8.1 cancel_reason vocabulary table - Add routing-key mutation to §9 conformance obligations - Add observer conformance entry to §9 - Collapse §1.3 self-identification (cut provenance/coexistence examples) - Fix TTS introspection topic typo (list.list.response → list.response) - Fix §4.8→§4.7 cross-references (TRANSFORM-1 rationale lives in §4.7) Co-Authored-By: opencode/glm-5.1 <noreply@opencode.ai>
- Add per-type injection-point rationale (why each is the only point for its class of work) - Add per-type canonical use cases (audio, utterance, metadata, intent, dialog, TTS) - Add per-type LLM-fit notes - Add cross-cutting architectural value rationale - Add cancellation in-spec use cases - Add introspection surface rationale (no aggregate query, typical consumers) Content moved from transformer.md normative body per editorial overhaul — rationale and use-case lists belong in the appendix, not in the prescriptive spec. Co-Authored-By: opencode/glm-5.1 <noreply@opencode.ai>
Match.owner_id → Match.skill_id in 5 locations. skill_id is the universal handler identity per the architecture model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
26 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Companion issue: #19
Summary
Defines a single normative contract for all six transformer plugin types that run at fixed injection points around the utterance lifecycle.
What the spec covers
session.{audio,utterance,metadata,intent,dialog,tts}_transformersfieldsovos.transformer.{type}.list/.list.response