Skip to content

feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115

Open
dgenio wants to merge 2 commits into
mainfrom
claude/github-issues-triage-B9UWd
Open

feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115
dgenio wants to merge 2 commits into
mainfrom
claude/github-issues-triage-B9UWd

Conversation

@dgenio

@dgenio dgenio commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Implements the recommended combined-PR group from issue triage: #94 (export action traces) and #99 (property-based tests). The two reinforce each other — #94 defines a stable, redaction-safe trace contract and #99 proves the audit/policy invariants that contract relies on.

What changed

#94 — Action trace export contract

  • export_action_trace / export_action_traces (trace.py): a stable, versioned (TRACE_EXPORT_VERSION = "1"), JSON-serialisable shape for ActionTrace records so downstream tools (e.g. LessonWeaver-style lesson extraction) can consume the audit trail without depending on internals.
  • Derived only from already-redaction-safe trace fields — args (memory payloads stripped at record time) and result_summary (post-firewall counts/flags) — so exporting cannot widen the I-01 boundary. A denied request never produces a trace (policy gates before invoke, I-02), so status is succeeded/failed; denials surface via explain_denial.
  • ActionTrace now records the invoked capability's sensitivity (plumbed through the invoke + stream paths).
  • Optional human-correction metadata can be attached at export time.
  • New docs/trace_export.md (incl. how it differs from the OpenTelemetry export) + examples/trace_export_demo.py (one succeeded + one failed action), wired into make ci.

#99 — Property-based invariant tests

  • New tests/test_policy_properties.py (Hypothesis) asserting, across generated principals/capabilities/scopes/constraints/handles/tokens:
    • every allow/deny carries a stable reason code;
    • max_rows never exceeds the policy cap;
    • handle expansion never exceeds the original grant (indirect-use scenario);
    • tokens never verify outside their scope; tampered/expired tokens are always rejected;
    • policy traces never leak raw scope values;
    • the trace export is always JSON-serialisable.
  • Adds hypothesis as a dev dependency.

Why

Closes the audit/trace gap in #94 (the TraceStore had no export contract) and the test-rigor gap in #99 (no property-based coverage). Grouped because they share the audit/trace surface (trace.py, ActionTrace, redaction) and the export's redaction guarantee is exactly what the property tests verify.

How verified

  • ruff check src/ tests/ examples/ — clean
  • ruff format --check — 77 files already formatted
  • mypy src/ — Success: no issues found in 41 source files
  • pytest581 passed, 1 skipped (test_mcp_driver skipped: mcp not installable in this sandbox due to an unrelated PyJWT/RECORD conflict; unaffected by this change)
  • Example list (incl. trace_export_demo.py) runs clean

Risks / caveats

  • ActionTrace gains a sensitivity field (defaults to NONE); traces constructed directly keep working. No retro-compat concern per the request.
  • Adds hypothesis to the dev extras only (no runtime dependency added).

Scope: limited to #94 + #99. Closes #94, closes #99.

https://claude.ai/code/session_01A1SNbC3izrsX4JggWWVSAB


Generated by Claude Code

…iant tests (#99)

#94 — Action trace export:
- Add export_action_trace / export_action_traces: a stable, versioned,
  JSON-serialisable shape for ActionTrace records so downstream tools (e.g.
  LessonWeaver-style lesson extraction) can consume the audit trail without
  depending on internals. Derived only from already-redaction-safe trace
  fields (redacted args, post-firewall result_summary), so it cannot widen
  the I-01 boundary. Optional human-correction metadata attaches at export
  time; denied requests never produce a trace (policy gates before invoke).
- Record the invoked capability's sensitivity on ActionTrace.
- New docs/trace_export.md (incl. how it differs from the OTel export) and
  examples/trace_export_demo.py (one succeeded + one failed action), wired
  into make ci.

#99 — Property-based tests (tests/test_policy_properties.py, Hypothesis):
- Stable reason code on every allow/deny; max_rows never exceeds the policy
  cap; handle expansion never exceeds the original grant (indirect-use
  scenario); tokens never verify outside scope and tampered/expired tokens
  are rejected; policy traces never leak raw scope values; trace export is
  always JSON-serialisable. Adds hypothesis as a dev dependency.

Validated: ruff check, ruff format --check, mypy (41 files), pytest
(581 passed, 1 skipped; test_mcp_driver skipped — mcp not installable
locally), and the example list all green.
Copilot AI review requested due to automatic review settings June 5, 2026 18:25

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a stable, versioned ActionTrace export contract for downstream audit/learning tooling and introduces Hypothesis-based property tests to continuously validate key authorization, redaction, and token-scope invariants across generated inputs.

Changes:

  • Introduces export_action_trace / export_action_traces with a versioned JSON envelope and documents the contract (plus a runnable demo).
  • Extends ActionTrace to record the invoked capability’s sensitivity and plumbs it through invoke + streaming trace recording.
  • Adds property-based tests (Hypothesis) covering reason-code stability, row-cap enforcement, handle expansion constraints, token scope binding, redaction non-leakage, and JSON-serialisable trace exports; wires the demo into CI.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_trace.py Expands trace tests to cover export shape, correction attachment, and end-to-end memory payload redaction.
tests/test_policy_properties.py Adds Hypothesis property tests for policy/token/handle/redaction/export invariants.
src/weaver_kernel/trace.py Defines the stable trace export schema/version and the export functions alongside TraceStore.
src/weaver_kernel/models.py Adds ActionTrace.sensitivity to persist capability sensitivity at record time.
src/weaver_kernel/kernel/_stream.py Records sensitivity on streaming traces.
src/weaver_kernel/kernel/_invoke.py Plumbs resolved Capability into invoke path and records sensitivity on success/failure traces.
src/weaver_kernel/kernel/init.py Passes resolved capability into perform_invoke.
src/weaver_kernel/init.py Exposes export contract symbols/functions in the public package API.
pyproject.toml Adds hypothesis to dev dependencies.
Makefile Runs the new trace export demo as part of make example / make ci.
examples/trace_export_demo.py Demonstrates exporting traces (success + driver failure) and attaching correction metadata.
docs/trace_export.md Documents the export envelope/fields, privacy guarantees, and how it differs from OTel export.
docs/architecture.md Updates TraceStore architecture docs to mention sensitivity + export contract.
CHANGELOG.md Notes the new export contract, sensitivity field, and property-based tests.
AGENTS.md Adds the trace export contract doc to the canonical instruction index.
.github/workflows/ci.yml Executes the new demo script in CI.

Comment thread src/weaver_kernel/models.py Outdated
args: dict[str, Any]
response_mode: ResponseMode
driver_id: str
sensitivity: SensitivityTag = SensitivityTag.NONE
Comment thread docs/trace_export.md Outdated
Comment on lines +139 to +142
`TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field
shape. New optional fields may be added without a bump, so consumers should
ignore unknown keys. Assert on `status`, `sensitivity`, and `reason`/`error`
rather than on human-readable strings, which may evolve.
- docs/trace_export.md: drop the stale `reason` field reference in the
  Stability section (the export shape only has `error`).
- models.py: declare ActionTrace.sensitivity last so adding it does not shift
  the positional __init__ order of pre-existing public fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add property-based tests for policy bypass and indirect capability use Interop: export action traces for downstream lesson extraction

3 participants