feat(traces): superadmin evaluation via a promoted evaluator account#36
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ount Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gateway refuses any account without an assistant model, so a source account with only an evaluator model advertised availability while every evaluator call 404'd. evaluatorAvailable now requires both models, the e2e asserts a real evaluator tool response (not just request dispatch), and a new api case locks the evaluator-only-no-assistant contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation clean() only wipes settings with owner.id matching /^test/, so seeding the source account at user/superadmin leaked state across runs and made the admin-info availability test history-dependent. Point the dev evaluatorAccount (and the tests) at organization/test1, which clean() resets between runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # api/config/custom-environment-variables.js # api/config/default.js # api/config/type/.type/index.d.ts # api/config/type/.type/validate.js # api/config/type/schema.json # api/src/admin/router.ts # ui/src/components/TraceReview.vue
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Let admin-mode superadmins evaluate any account's traces using a configured source account's evaluator, so the reviewed account is never charged.
What changed:
resolveUsageIdentitynow treats anadminModesuperadmin as an admin of whatever account a gateway/summary call targets, so they may consume it. That account's quotas still apply and its usage is still recorded (under the superadmin's id) — no skip-billing.config.evaluatorAccount({type,id}, envEVALUATOR_ACCOUNT_TYPE/EVALUATOR_ACCOUNT_ID, default off) designates the source account. Admin/infoadvertises it plusevaluatorAvailable(true only when that account has both an assistant and an evaluator model — the gateway refuses any account without an assistant).EvaluatorChat(and its summarizer tool) at the source account, so the reviewed account is never called; the chat is disabled with a hint when the source is unset/unconfigured or admin mode is off. The non-admin account-admin review path is unchanged.Why: superadmins need to review traces across accounts without consuming the reviewed account's tokens.
Heads-up:
adminModesuperadmin can now consume any account's gateway/summary as admin (not just the evaluator).adminModeis JWT-validated and only true for genuine superadmins.config.evaluatorAccountat a dedicated, fully-configured account; don't addadminto its moderation categories or superadmin review messages would be moderated.