[release] v0.103.5#4690
Merged
Merged
Conversation
Signed-off-by: axelray-dev <110029405+axelray-dev@users.noreply.github.com>
…535] Replace programmatic router.push with native href on the evaluator button so that clicking always navigates even if the trace drawer close handler does not complete. Removes the unused navigateToEvaluator call and preventDefault, keeping only stopPropagation to avoid triggering the parent popover's hover behavior.
One project is in scope at a time in the web app, so grouping batched requests by project and issuing one query per project handles a state that cannot exist. Every batchFn now takes the single project in scope, throws if coalesced requests disagree, and resolves all ids with one call. Documents the invariant in web/AGENTS.md.
…child selection features
…ncluding suffix node support and improved evaluator name resolution.
…dling and metadata display
The observability trace filter never listed annotation feedback fields
(score, comment, etc.) from evaluators, so feedback sent via the API was
not filterable.
Two causes, both fixed on the frontend:
- The filter read evaluator.metrics off thin list refs that carry no
data; it now resolves each evaluator's latest revision via a new
evaluatorFeedbackSchemasAtom.
- Auto-created feedback evaluators store a genson-inferred output schema
wrapped one level deeper ({outputs:{properties}}); resolveOutputSchema-
Properties now unwraps that envelope so real metric keys surface.
Also corrects docs that claimed evaluators are not auto-created.
… and improved UI interactions
…nd improve parent checkbox state handling in PopoverCascaderVariant
A walkthrough demo for classifying CVs against a job spec with Agenta: - Curated test set of 30 real Markdown CVs (from the public opensporks/resumes dataset on Hugging Face, a mirror of the Kaggle Resume Dataset), hand-labeled against an IT Manager job spec - prepare_testset.py rebuilds the CSV reproducibly and can upload it to Agenta via the SDK - create_app.py creates the completion app with the screening prompt and structured-output JSON schema, and deploys it to production - Streamlit demo UI: PDF upload -> Markdown (markitdown) -> prompt fetched from the Agenta registry -> structured score dashboard - Sample CV PDFs (one per classification) generated from the test set https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
The Streamlit app now shows a thumbs up/down form with an optional comment after each screening. Submitting it attaches the feedback to the screening's trace in Agenta as an annotation (evaluator slug 'user-feedback'), following the capture-user-feedback cookbook: the invocation link is captured inside the instrumented classify_cv call and the annotation is POSTed to /api/simple/traces/. Screening results now persist in session state so the result and feedback form survive Streamlit reruns. Entry scripts load .env via python-dotenv, matching the documented setup flow. https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
…pt revision
Move all the AI logic out of the Streamlit app into a new screening.py
module (prompt fetch, the LLM call, tracing, feedback), leaving app.py as
a UI-only shell. Any other frontend can import screening.py unchanged.
Tracing improvements so screenings are easy to act on from the UI:
- Auto-instrument the OpenAI client with OpenInference, so every trace has
a child LLM span with the exact messages, token counts, and cost.
- classify_cv takes its inputs as a dict whose keys match the prompt input
variables ({"cv": ...}), and the prompt config is kept out of the trace
(ignore_inputs). The span data then mirrors the completion app's inputs.
- Link each span to the deployed prompt revision via ag.tracing.store_refs,
so traces filter by app/environment and open in the playground on the
right revision with inputs pre-filled.
Also fix create_app.py to read variant.variant_version as an attribute
(VariantManager now returns a ConfigurationResponse, not a dict).
The walkthrough needed a leaner story: the output schema is now tech_match / experience_match / overall_match, each with a short reason, plus the missing-requirements list. overall_match is a holistic hire-or-not judgment, so a requirement like a language can flip it while the other two stay true. The test set drops the bookkeeping columns and carries one expected_* column per dimension; empty cells are skipped by the code evaluator documented in the Readme.
…ty filter Evaluators without an output schema expose no feedback metrics to suggest, and the feedback-field Select cleared any typed value. The Select now surfaces the typed text as a '<typed> (custom)' option that commits and persists, so users can filter by a feedback name even when the schema can't provide one.
… evaluations page contract
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…elector-ui [Feat]: improve cascade entity selector UI
… query and improve cache handling
…avigation [4535] fix(frontend): fix evaluator playground navigation from trace drawer
…-fetchers refactor(frontend): drop per-project fan-out from all batch fetchers
feat(examples): CV screening demo with feedback-to-deploy walkthrough
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[fix] Resolve broken invites in OSS (again)
bekossy
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New version v0.103.5 in