Skip to content

feat(traces): give the trace evaluator data-fair data-exploration tools#37

Merged
albanm merged 11 commits into
mainfrom
feat-eval-exploration
Jun 19, 2026
Merged

feat(traces): give the trace evaluator data-fair data-exploration tools#37
albanm merged 11 commits into
mainfrom
feat-eval-exploration

Conversation

@albanm

@albanm albanm commented Jun 19, 2026

Copy link
Copy Markdown
Member

Adds read-only data-exploration tools to the admin trace evaluator so it can check a reviewed session against the actual data instead of judging from the trace text alone.

  • New evaluator-data-tools.ts builds list_datasets, describe_dataset, get_dataset_schema, search_data, aggregate_data, calculate_metric, get_field_values (reused from @data-fair/agent-tools-data-fair) plus an evaluator-specific get_dataset_metadata_raw for the full untrimmed metadata.
  • Tools call the same-origin data-fair API with the reviewer's session cookie, scoped to the conversation's owner account (owner=type:id[:department]), so a superadmin reviewing account X explores X's data. They register always and degrade gracefully when data-fair is unreachable.
  • Merged flat into EvaluatorChat's tools; department threaded through TraceReview; evaluator prompt updated.

Why: so the evaluator, when opened in a data-fair context, has data-exploration tools like the normal agent and can extend its evaluation to the real data (e.g. missing dataset descriptions, schema quality, whether a search would return rows).

Heads-up: superadmin cross-account exploration relies on an active adminMode session (no data-fair change); publicationSite scoping is deferred (not captured in traces). Adds the @data-fair/agent-tools-data-fair UI dependency.

albanm and others added 9 commits June 19, 2026 17:34
Read-only data-exploration tools for the trace evaluator, reusing
@data-fair/agent-tools-data-fair here and calling the same-origin
data-fair API scoped to the conversation's owner account. Includes a
raw metadata tool for metadata-quality and tool-coverage evaluation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…field-values tools

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Also covers the search_data next-pagination branch with a test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge buildEvaluatorDataTools into the evaluator localTools and thread the
conversation owner's department through TraceReview -> EvaluatorChat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ontract

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
albanm added 2 commits June 19, 2026 18:27
# Conflicts:
#	ui/src/components/EvaluatorChat.vue
#	ui/src/components/TraceReview.vue
# Conflicts:
#	ui/src/components/TraceReview.vue
@albanm albanm merged commit 38aacda into main Jun 19, 2026
3 checks passed
@albanm albanm deleted the feat-eval-exploration branch June 19, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant