feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115
Open
dgenio wants to merge 2 commits into
Open
feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115dgenio wants to merge 2 commits into
dgenio wants to merge 2 commits into
Conversation
…iant tests (#99) #94 — Action trace export: - Add export_action_trace / export_action_traces: a stable, versioned, JSON-serialisable shape for ActionTrace records so downstream tools (e.g. LessonWeaver-style lesson extraction) can consume the audit trail without depending on internals. Derived only from already-redaction-safe trace fields (redacted args, post-firewall result_summary), so it cannot widen the I-01 boundary. Optional human-correction metadata attaches at export time; denied requests never produce a trace (policy gates before invoke). - Record the invoked capability's sensitivity on ActionTrace. - New docs/trace_export.md (incl. how it differs from the OTel export) and examples/trace_export_demo.py (one succeeded + one failed action), wired into make ci. #99 — Property-based tests (tests/test_policy_properties.py, Hypothesis): - Stable reason code on every allow/deny; max_rows never exceeds the policy cap; handle expansion never exceeds the original grant (indirect-use scenario); tokens never verify outside scope and tampered/expired tokens are rejected; policy traces never leak raw scope values; trace export is always JSON-serialisable. Adds hypothesis as a dev dependency. Validated: ruff check, ruff format --check, mypy (41 files), pytest (581 passed, 1 skipped; test_mcp_driver skipped — mcp not installable locally), and the example list all green.
There was a problem hiding this comment.
Pull request overview
Adds a stable, versioned ActionTrace export contract for downstream audit/learning tooling and introduces Hypothesis-based property tests to continuously validate key authorization, redaction, and token-scope invariants across generated inputs.
Changes:
- Introduces
export_action_trace/export_action_traceswith a versioned JSON envelope and documents the contract (plus a runnable demo). - Extends
ActionTraceto record the invoked capability’ssensitivityand plumbs it through invoke + streaming trace recording. - Adds property-based tests (Hypothesis) covering reason-code stability, row-cap enforcement, handle expansion constraints, token scope binding, redaction non-leakage, and JSON-serialisable trace exports; wires the demo into CI.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_trace.py | Expands trace tests to cover export shape, correction attachment, and end-to-end memory payload redaction. |
| tests/test_policy_properties.py | Adds Hypothesis property tests for policy/token/handle/redaction/export invariants. |
| src/weaver_kernel/trace.py | Defines the stable trace export schema/version and the export functions alongside TraceStore. |
| src/weaver_kernel/models.py | Adds ActionTrace.sensitivity to persist capability sensitivity at record time. |
| src/weaver_kernel/kernel/_stream.py | Records sensitivity on streaming traces. |
| src/weaver_kernel/kernel/_invoke.py | Plumbs resolved Capability into invoke path and records sensitivity on success/failure traces. |
| src/weaver_kernel/kernel/init.py | Passes resolved capability into perform_invoke. |
| src/weaver_kernel/init.py | Exposes export contract symbols/functions in the public package API. |
| pyproject.toml | Adds hypothesis to dev dependencies. |
| Makefile | Runs the new trace export demo as part of make example / make ci. |
| examples/trace_export_demo.py | Demonstrates exporting traces (success + driver failure) and attaching correction metadata. |
| docs/trace_export.md | Documents the export envelope/fields, privacy guarantees, and how it differs from OTel export. |
| docs/architecture.md | Updates TraceStore architecture docs to mention sensitivity + export contract. |
| CHANGELOG.md | Notes the new export contract, sensitivity field, and property-based tests. |
| AGENTS.md | Adds the trace export contract doc to the canonical instruction index. |
| .github/workflows/ci.yml | Executes the new demo script in CI. |
| args: dict[str, Any] | ||
| response_mode: ResponseMode | ||
| driver_id: str | ||
| sensitivity: SensitivityTag = SensitivityTag.NONE |
Comment on lines
+139
to
+142
| `TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field | ||
| shape. New optional fields may be added without a bump, so consumers should | ||
| ignore unknown keys. Assert on `status`, `sensitivity`, and `reason`/`error` | ||
| rather than on human-readable strings, which may evolve. |
- docs/trace_export.md: drop the stale `reason` field reference in the Stability section (the export shape only has `error`). - models.py: declare ActionTrace.sensitivity last so adding it does not shift the positional __init__ order of pre-existing public fields.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the recommended combined-PR group from issue triage: #94 (export action traces) and #99 (property-based tests). The two reinforce each other — #94 defines a stable, redaction-safe trace contract and #99 proves the audit/policy invariants that contract relies on.
What changed
#94 — Action trace export contract
export_action_trace/export_action_traces(trace.py): a stable, versioned (TRACE_EXPORT_VERSION = "1"), JSON-serialisable shape forActionTracerecords so downstream tools (e.g. LessonWeaver-style lesson extraction) can consume the audit trail without depending on internals.args(memory payloads stripped at record time) andresult_summary(post-firewall counts/flags) — so exporting cannot widen the I-01 boundary. A denied request never produces a trace (policy gates before invoke, I-02), sostatusissucceeded/failed; denials surface viaexplain_denial.ActionTracenow records the invoked capability'ssensitivity(plumbed through the invoke + stream paths).docs/trace_export.md(incl. how it differs from the OpenTelemetry export) +examples/trace_export_demo.py(one succeeded + one failed action), wired intomake ci.#99 — Property-based invariant tests
tests/test_policy_properties.py(Hypothesis) asserting, across generated principals/capabilities/scopes/constraints/handles/tokens:max_rowsnever exceeds the policy cap;hypothesisas a dev dependency.Why
Closes the audit/trace gap in #94 (the
TraceStorehad no export contract) and the test-rigor gap in #99 (no property-based coverage). Grouped because they share the audit/trace surface (trace.py,ActionTrace, redaction) and the export's redaction guarantee is exactly what the property tests verify.How verified
ruff check src/ tests/ examples/— cleanruff format --check— 77 files already formattedmypy src/— Success: no issues found in 41 source filespytest— 581 passed, 1 skipped (test_mcp_driverskipped:mcpnot installable in this sandbox due to an unrelated PyJWT/RECORD conflict; unaffected by this change)trace_export_demo.py) runs cleanRisks / caveats
ActionTracegains asensitivityfield (defaults toNONE); traces constructed directly keep working. No retro-compat concern per the request.hypothesisto the dev extras only (no runtime dependency added).Scope: limited to #94 + #99. Closes #94, closes #99.
https://claude.ai/code/session_01A1SNbC3izrsX4JggWWVSAB
Generated by Claude Code