feat: ActionTrace export contract (#94) + property-based invariant tests (#99) by dgenio · Pull Request #115 · dgenio/agent-kernel

dgenio · 2026-06-05T18:25:38Z

Implements the recommended combined-PR group from issue triage: #94 (export action traces) and #99 (property-based tests). The two reinforce each other — #94 defines a stable, redaction-safe trace contract and #99 proves the audit/policy invariants that contract relies on.

What changed

#94 — Action trace export contract

export_action_trace / export_action_traces (trace.py): a stable, versioned (TRACE_EXPORT_VERSION = "1"), JSON-serialisable shape for ActionTrace records so downstream tools (e.g. LessonWeaver-style lesson extraction) can consume the audit trail without depending on internals.
Derived only from already-redaction-safe trace fields — args (memory payloads stripped at record time) and result_summary (post-firewall counts/flags) — so exporting cannot widen the I-01 boundary. A denied request never produces a trace (policy gates before invoke, I-02), so status is succeeded/failed; denials surface via explain_denial.
ActionTrace now records the invoked capability's sensitivity (plumbed through the invoke + stream paths).
Optional human-correction metadata can be attached at export time.
New docs/trace_export.md (incl. how it differs from the OpenTelemetry export) + examples/trace_export_demo.py (one succeeded + one failed action), wired into make ci.

#99 — Property-based invariant tests

New tests/test_policy_properties.py (Hypothesis) asserting, across generated principals/capabilities/scopes/constraints/handles/tokens:
- every allow/deny carries a stable reason code;
- max_rows never exceeds the policy cap;
- handle expansion never exceeds the original grant (indirect-use scenario);
- tokens never verify outside their scope; tampered/expired tokens are always rejected;
- policy traces never leak raw scope values;
- the trace export is always JSON-serialisable.
Adds hypothesis as a dev dependency.

Why

Closes the audit/trace gap in #94 (the TraceStore had no export contract) and the test-rigor gap in #99 (no property-based coverage). Grouped because they share the audit/trace surface (trace.py, ActionTrace, redaction) and the export's redaction guarantee is exactly what the property tests verify.

How verified

ruff check src/ tests/ examples/ — clean
ruff format --check — 77 files already formatted
mypy src/ — Success: no issues found in 41 source files
pytest — 581 passed, 1 skipped (test_mcp_driver skipped: mcp not installable in this sandbox due to an unrelated PyJWT/RECORD conflict; unaffected by this change)
Example list (incl. trace_export_demo.py) runs clean

Risks / caveats

ActionTrace gains a sensitivity field (defaults to NONE); traces constructed directly keep working. No retro-compat concern per the request.
Adds hypothesis to the dev extras only (no runtime dependency added).

Scope: limited to #94 + #99. Closes #94, closes #99.

https://claude.ai/code/session_01A1SNbC3izrsX4JggWWVSAB

Generated by Claude Code

…iant tests (#99) #94 — Action trace export: - Add export_action_trace / export_action_traces: a stable, versioned, JSON-serialisable shape for ActionTrace records so downstream tools (e.g. LessonWeaver-style lesson extraction) can consume the audit trail without depending on internals. Derived only from already-redaction-safe trace fields (redacted args, post-firewall result_summary), so it cannot widen the I-01 boundary. Optional human-correction metadata attaches at export time; denied requests never produce a trace (policy gates before invoke). - Record the invoked capability's sensitivity on ActionTrace. - New docs/trace_export.md (incl. how it differs from the OTel export) and examples/trace_export_demo.py (one succeeded + one failed action), wired into make ci. #99 — Property-based tests (tests/test_policy_properties.py, Hypothesis): - Stable reason code on every allow/deny; max_rows never exceeds the policy cap; handle expansion never exceeds the original grant (indirect-use scenario); tokens never verify outside scope and tampered/expired tokens are rejected; policy traces never leak raw scope values; trace export is always JSON-serialisable. Adds hypothesis as a dev dependency. Validated: ruff check, ruff format --check, mypy (41 files), pytest (581 passed, 1 skipped; test_mcp_driver skipped — mcp not installable locally), and the example list all green.

Copilot

Pull request overview

Adds a stable, versioned ActionTrace export contract for downstream audit/learning tooling and introduces Hypothesis-based property tests to continuously validate key authorization, redaction, and token-scope invariants across generated inputs.

Changes:

Introduces export_action_trace / export_action_traces with a versioned JSON envelope and documents the contract (plus a runnable demo).
Extends ActionTrace to record the invoked capability’s sensitivity and plumbs it through invoke + streaming trace recording.
Adds property-based tests (Hypothesis) covering reason-code stability, row-cap enforcement, handle expansion constraints, token scope binding, redaction non-leakage, and JSON-serialisable trace exports; wires the demo into CI.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_trace.py	Expands trace tests to cover export shape, correction attachment, and end-to-end memory payload redaction.
tests/test_policy_properties.py	Adds Hypothesis property tests for policy/token/handle/redaction/export invariants.
src/weaver_kernel/trace.py	Defines the stable trace export schema/version and the export functions alongside `TraceStore`.
src/weaver_kernel/models.py	Adds `ActionTrace.sensitivity` to persist capability sensitivity at record time.
src/weaver_kernel/kernel/_stream.py	Records `sensitivity` on streaming traces.
src/weaver_kernel/kernel/_invoke.py	Plumbs resolved `Capability` into invoke path and records `sensitivity` on success/failure traces.
src/weaver_kernel/kernel/init.py	Passes resolved capability into `perform_invoke`.
src/weaver_kernel/init.py	Exposes export contract symbols/functions in the public package API.
pyproject.toml	Adds `hypothesis` to dev dependencies.
Makefile	Runs the new trace export demo as part of `make example` / `make ci`.
examples/trace_export_demo.py	Demonstrates exporting traces (success + driver failure) and attaching correction metadata.
docs/trace_export.md	Documents the export envelope/fields, privacy guarantees, and how it differs from OTel export.
docs/architecture.md	Updates TraceStore architecture docs to mention sensitivity + export contract.
CHANGELOG.md	Notes the new export contract, sensitivity field, and property-based tests.
AGENTS.md	Adds the trace export contract doc to the canonical instruction index.
.github/workflows/ci.yml	Executes the new demo script in CI.

    args: dict[str, Any]
    response_mode: ResponseMode
    driver_id: str
+    sensitivity: SensitivityTag = SensitivityTag.NONE


+`TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field
+shape. New optional fields may be added without a bump, so consumers should
+ignore unknown keys. Assert on `status`, `sensitivity`, and `reason`/`error`
+rather than on human-readable strings, which may evolve.


- docs/trace_export.md: drop the stale `reason` field reference in the Stability section (the export shape only has `error`). - models.py: declare ActionTrace.sensitivity last so adding it does not shift the positional __init__ order of pre-existing public fields.

Copilot AI review requested due to automatic review settings June 5, 2026 18:25

Copilot started reviewing on behalf of dgenio June 5, 2026 18:25 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115

feat: ActionTrace export contract (#94) + property-based invariant tests (#99)#115
dgenio wants to merge 2 commits into
mainfrom
claude/github-issues-triage-B9UWd

dgenio commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented Jun 5, 2026

What changed

#94 — Action trace export contract

#99 — Property-based invariant tests

Why

How verified

Risks / caveats

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants