feat(kanban): nightly drift reconciler + per-repo .kanban.yml override#55
Merged
LukasWodka merged 1 commit intoJun 3, 2026
Merged
Conversation
Two enhancements to harden the kanban automation against silent stalls:
1. **kanban-reconcile.yml** (new) — nightly cron at 04:00 UTC scans the kanban
for non-terminal columns and self-heals four classes of drift:
- Merged PRs whose SHA is on the repo's prod branch → Prod
- Merged release-vehicle PRs ("Release: …", "Dev to staging", "Staging to prod") → Prod
- Closed-not-merged PRs in any non-terminal column → Cancelled
- Open issues sitting in post-merge columns → Backlog
Safety cap: aborts if a single run would move more than 100 items (treats
that as a bug, not real drift). Supports `dry-run` via workflow_dispatch.
2. **advance-deploy-env.yml** — added `.kanban.yml` per-repo override. Repos
that deploy out-of-band (e.g. averaging-service ships Docker images from
staging, no main push) can declare:
# .kanban.yml
branch_status_map:
staging: Prod
The override merges with the default mapping (develop→FR-on-dev,
staging→FR-on-staging, master|main→Prod). This was the root cause of
19 averaging-service items getting stuck in Ready-for-prod today —
main was never pushed because the deploy convention doesn't require it.
Catches everything we hand-cleaned today (≈109 items). Layer 2 (stuck-in-
column nudge bot) coming as a follow-up PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
👋 Heads-up — Code review queue is at 12 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two enhancements to harden the kanban automation against silent stalls — catches the failure modes that produced the ≈109 items we hand-swept today.
1.
kanban-reconcile.yml— nightly drift reconciler (new)Cron at 04:00 UTC daily scans non-terminal columns (
Code review,FR on dev,Ready for staging,FR on staging,Ready for prod) and self-heals four classes of drift:drift-to-prod)^(Release|Dev to staging|Staging to prod)patternrelease-vehicle)cancelled)misplaced-issue)Safety net: aborts if any single run would move > 100 items (treats that as a bug, not real drift). Supports
workflow_dispatchwithdry-run: truefor safe testing.2.
advance-deploy-env.yml—.kanban.ymlper-repo overrideFor repos where the default branch isn't the prod-truth (e.g. averaging-service ships Docker images from
stagingwithout ever pushingmain), drop a.kanban.ymlat the repo root:The override merges with the default mapping. This was the root cause of 19 averaging-service items getting stuck in
Ready for prodtoday —mainwas never pushed because the deploy convention doesn't require it.Why now
Cleanup audit this session moved ~109 items from
Ready for prod/FR on stagingtoProd(orBacklogfor misclassified open issues). Every move fit one of the four patterns above. With this reconciler running nightly, the next time something silently drifts it'll be auto-healed within 24 hours instead of accumulating for weeks.Diagnosis details in the conversation log; key root causes that still bite without this PR:
advance-deploy-envsilently crashed for weeks before the-F→-ffix landed (fix: pass kanban option IDs as raw strings (-f not -F) to survive numeric IDs #52); items in flight when the bug hit never moved..kanban.yml.Test plan
planned: 0since today's cleanup left RfP / FR-on-staging empty.dry-run: truefrom the Actions tab to confirm classification logic before relying on auto-apply..kanban.ymlintoaveraging-serviceto lock in thestaging: Prodconvention.Follow-ups (separate PR)
.kanban.ymlis in averaging-service, the next staging push will auto-advance items toProdand the reconciler becomes a backstop, not the primary path.🤖 Generated with Claude Code