Skip to content

Pull requests: benchflow-ai/benchflow

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

fix(adapters): Toolathlon spawn-time token resolution + exit-code verifier scoring area:eval Issue / PR lives primarily in the "eval" subsystem. bug Something isn't working P1 Important debt — must fix soon, but does not block the current release. review:pending PR is ready-for-review, no reviewer engagement yet. status:ready Triaged, unassigned, available to claim.
#887 opened Jul 3, 2026 by bingran-you Collaborator Loading…
feat(cli): bench init onboarding wizard + bench doctor area:diagnostics Issue / PR lives primarily in the "diagnostics" subsystem. enhancement New feature or request P1 Important debt — must fix soon, but does not block the current release. review:pending PR is ready-for-review, no reviewer engagement yet. status:ready Triaged, unassigned, available to claim.
#883 opened Jul 2, 2026 by Yiminnn Collaborator Loading…
fix(litellm): retry transient upstream 5xx at the proxy area:eval Issue / PR lives primarily in the "eval" subsystem. bug Something isn't working P1 Important debt — must fix soon, but does not block the current release. review:pending PR is ready-for-review, no reviewer engagement yet. status:ready Triaged, unassigned, available to claim.
#882 opened Jul 2, 2026 by Yiminnn Collaborator Loading…
feat(multi-agent): native concurrent floor (bench eval run --agents) + bf.* trajectory tree + hosted medical benchmark area:eval Issue / PR lives primarily in the "eval" subsystem. enhancement New feature or request P2 Anti-pattern / type safety / docs precision / minor schema drift / non-deterministic but contained. review:changes-requested Author needs to push more commits before this can merge. status:blocked Waiting on external dependency. Add a comment explaining why.
#846 opened Jun 28, 2026 by Yiminnn Collaborator Draft
3 of 6 tasks
Merge main into the 0.7 line + land OSWorld vendored evaluator area:eval Issue / PR lives primarily in the "eval" subsystem. enhancement New feature or request P2 Anti-pattern / type safety / docs precision / minor schema drift / non-deterministic but contained. review:changes-requested Author needs to push more commits before this can merge. status:blocked Waiting on external dependency. Add a comment explaining why.
#827 opened Jun 24, 2026 by xdotli Member Loading…
fix(deps): require anyio>=4.10 so the daytona SDK imports area:sandbox Issue / PR lives primarily in the "sandbox" subsystem. bug Something isn't working P1 Important debt — must fix soon, but does not block the current release. review:pending PR is ready-for-review, no reviewer engagement yet. status:blocked Waiting on external dependency. Add a comment explaining why.
#826 opened Jun 24, 2026 by xdotli Member Loading…
chore(deps)!: 7-day dependency cooldown (uv exclude-newer + CI assert) enhancement New feature or request P2 Anti-pattern / type safety / docs precision / minor schema drift / non-deterministic but contained. review:pending PR is ready-for-review, no reviewer engagement yet. status:ready Triaged, unassigned, available to claim.
#788 opened Jun 15, 2026 by xdotli Member Loading…
feat(sandbox): Enforce network_mode allowlist on docker area:sandbox Issue / PR lives primarily in the "sandbox" subsystem. enhancement New feature or request P1 Important debt — must fix soon, but does not block the current release. review:changes-requested Author needs to push more commits before this can merge. status:blocked Waiting on external dependency. Add a comment explaining why.
#785 opened Jun 15, 2026 by Yiminnn Collaborator Loading…
ProTip! Adding no:label will show everything without a label.