Skip to content

Fail fast on systemic SearchQA rollout failures#64

Open
summerview1997 wants to merge 2 commits into
microsoft:mainfrom
summerview1997:codex/searchqa-rollout-failfast
Open

Fail fast on systemic SearchQA rollout failures#64
summerview1997 wants to merge 2 commits into
microsoft:mainfrom
summerview1997:codex/searchqa-rollout-failfast

Conversation

@summerview1997

Copy link
Copy Markdown

Summary

This PR makes SearchQA rollout fail fast when every item in a batch failed before the target agent produced any response.

Previously, per-item exceptions such as model endpoint misconfiguration were recorded as ordinary failed answers. If every item had agent_ok=false, the trainer could continue with a complete-looking run and all-zero scores, even though no agent responses were produced.

Changes

  • Add a SearchQA rollout guard that detects all rows with agent_ok=false.
  • Raise a runtime error summarizing the most common fail_reason.
  • Apply the guard to both resumed/cached result paths and newly completed batches.
  • Keep ordinary wrong-answer results valid when at least one row has an agent response.
  • Add regression tests for cached systemic failures and answered wrong rollouts.

Impact

Infrastructure failures such as missing or unreachable model endpoints become visible immediately instead of being mistaken for model quality or skill optimization failure.

Validation

  • /home/thomas/SkillOpt/.venv/bin/python -m pytest -q tests/test_searchqa_rollout_failfast.py
  • /home/thomas/SkillOpt/.venv/bin/python -m pytest -q
  • /home/thomas/SkillOpt/.venv/bin/python -m ruff check skillopt/envs/searchqa/rollout.py tests/test_searchqa_rollout_failfast.py
  • /home/thomas/SkillOpt/.venv/bin/python -m py_compile skillopt/envs/searchqa/rollout.py tests/test_searchqa_rollout_failfast.py
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant