Skip to content

[release] v0.103.5#4690

Merged
bekossy merged 77 commits into
mainfrom
release/v0.103.5
Jun 15, 2026
Merged

[release] v0.103.5#4690
bekossy merged 77 commits into
mainfrom
release/v0.103.5

Conversation

@github-actions

Copy link
Copy Markdown
Contributor

New version v0.103.5 in

  • web
    • web/oss
    • web/ee
  • services
  • api
  • sdks
    • sdks/python
  • clients
    • clients/python
    • clients/typescript
  • kubernetes
    • kubernetes/helm

axelray-dev and others added 26 commits June 4, 2026 03:02
Signed-off-by: axelray-dev <110029405+axelray-dev@users.noreply.github.com>
…535]

Replace programmatic router.push with native href on the evaluator
button so that clicking always navigates even if the trace drawer
close handler does not complete. Removes the unused navigateToEvaluator
call and preventDefault, keeping only stopPropagation to avoid
triggering the parent popover's hover behavior.
One project is in scope at a time in the web app, so grouping batched
requests by project and issuing one query per project handles a state
that cannot exist. Every batchFn now takes the single project in scope,
throws if coalesced requests disagree, and resolves all ids with one
call. Documents the invariant in web/AGENTS.md.
…ncluding suffix node support and improved evaluator name resolution.
The observability trace filter never listed annotation feedback fields
(score, comment, etc.) from evaluators, so feedback sent via the API was
not filterable.

Two causes, both fixed on the frontend:
- The filter read evaluator.metrics off thin list refs that carry no
  data; it now resolves each evaluator's latest revision via a new
  evaluatorFeedbackSchemasAtom.
- Auto-created feedback evaluators store a genson-inferred output schema
  wrapped one level deeper ({outputs:{properties}}); resolveOutputSchema-
  Properties now unwraps that envelope so real metric keys surface.

Also corrects docs that claimed evaluators are not auto-created.
…nd improve parent checkbox state handling in PopoverCascaderVariant
A walkthrough demo for classifying CVs against a job spec with Agenta:

- Curated test set of 30 real Markdown CVs (from the public
  opensporks/resumes dataset on Hugging Face, a mirror of the Kaggle
  Resume Dataset), hand-labeled against an IT Manager job spec
- prepare_testset.py rebuilds the CSV reproducibly and can upload it
  to Agenta via the SDK
- create_app.py creates the completion app with the screening prompt
  and structured-output JSON schema, and deploys it to production
- Streamlit demo UI: PDF upload -> Markdown (markitdown) -> prompt
  fetched from the Agenta registry -> structured score dashboard
- Sample CV PDFs (one per classification) generated from the test set

https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
The Streamlit app now shows a thumbs up/down form with an optional
comment after each screening. Submitting it attaches the feedback to
the screening's trace in Agenta as an annotation (evaluator slug
'user-feedback'), following the capture-user-feedback cookbook:
the invocation link is captured inside the instrumented classify_cv
call and the annotation is POSTed to /api/simple/traces/.

Screening results now persist in session state so the result and
feedback form survive Streamlit reruns. Entry scripts load .env via
python-dotenv, matching the documented setup flow.

https://claude.ai/code/session_01YMbf4sUb2VBFQHGNKv6yh3
…pt revision

Move all the AI logic out of the Streamlit app into a new screening.py
module (prompt fetch, the LLM call, tracing, feedback), leaving app.py as
a UI-only shell. Any other frontend can import screening.py unchanged.

Tracing improvements so screenings are easy to act on from the UI:

- Auto-instrument the OpenAI client with OpenInference, so every trace has
  a child LLM span with the exact messages, token counts, and cost.
- classify_cv takes its inputs as a dict whose keys match the prompt input
  variables ({"cv": ...}), and the prompt config is kept out of the trace
  (ignore_inputs). The span data then mirrors the completion app's inputs.
- Link each span to the deployed prompt revision via ag.tracing.store_refs,
  so traces filter by app/environment and open in the playground on the
  right revision with inputs pre-filled.

Also fix create_app.py to read variant.variant_version as an attribute
(VariantManager now returns a ConfigurationResponse, not a dict).
The walkthrough needed a leaner story: the output schema is now
tech_match / experience_match / overall_match, each with a short reason,
plus the missing-requirements list. overall_match is a holistic
hire-or-not judgment, so a requirement like a language can flip it while
the other two stay true. The test set drops the bookkeeping columns and
carries one expected_* column per dimension; empty cells are skipped by
the code evaluator documented in the Readme.
…ty filter

Evaluators without an output schema expose no feedback metrics to suggest,
and the feedback-field Select cleared any typed value. The Select now
surfaces the typed text as a '<typed> (custom)' option that commits and
persists, so users can filter by a feedback name even when the schema can't
provide one.
@vercel

vercel Bot commented Jun 13, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 15, 2026 4:53pm

Request Review

@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 13, 2026
…elector-ui

[Feat]: improve cascade entity selector UI
bekossy added 2 commits June 15, 2026 15:53
…avigation

[4535] fix(frontend): fix evaluator playground navigation from trace drawer
…-fetchers

refactor(frontend): drop per-project fan-out from all batch fetchers
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 15, 2026
jp-agenta and others added 5 commits June 15, 2026 17:12
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[fix] Resolve broken invites in OSS (again)
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 15, 2026
@bekossy bekossy merged commit 58a5cca into main Jun 15, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants