Skip to content

feat(kanban): nightly drift reconciler + per-repo .kanban.yml override#55

Merged
LukasWodka merged 1 commit into
developfrom
feat/kanban-reconcile-and-prod-branch-override
Jun 3, 2026
Merged

feat(kanban): nightly drift reconciler + per-repo .kanban.yml override#55
LukasWodka merged 1 commit into
developfrom
feat/kanban-reconcile-and-prod-branch-override

Conversation

@LukasWodka
Copy link
Copy Markdown
Contributor

Summary

Two enhancements to harden the kanban automation against silent stalls — catches the failure modes that produced the ≈109 items we hand-swept today.

1. kanban-reconcile.yml — nightly drift reconciler (new)

Cron at 04:00 UTC daily scans non-terminal columns (Code review, FR on dev, Ready for staging, FR on staging, Ready for prod) and self-heals four classes of drift:

Class Action
Merged PR, SHA on repo's prod branch Prod (drift-to-prod)
Merged PR matching ^(Release|Dev to staging|Staging to prod) pattern Prod (release-vehicle)
Closed-not-merged PR in any non-terminal column Cancelled (cancelled)
Open issue in a post-merge column Backlog (misplaced-issue)

Safety net: aborts if any single run would move > 100 items (treats that as a bug, not real drift). Supports workflow_dispatch with dry-run: true for safe testing.

2. advance-deploy-env.yml.kanban.yml per-repo override

For repos where the default branch isn't the prod-truth (e.g. averaging-service ships Docker images from staging without ever pushing main), drop a .kanban.yml at the repo root:

# averaging-service/.kanban.yml
branch_status_map:
  staging: Prod

The override merges with the default mapping. This was the root cause of 19 averaging-service items getting stuck in Ready for prod today — main was never pushed because the deploy convention doesn't require it.

Why now

Cleanup audit this session moved ~109 items from Ready for prod / FR on staging to Prod (or Backlog for misclassified open issues). Every move fit one of the four patterns above. With this reconciler running nightly, the next time something silently drifts it'll be auto-healed within 24 hours instead of accumulating for weeks.

Diagnosis details in the conversation log; key root causes that still bite without this PR:

Test plan

  • Merge to develop; let the natural develop → main promotion carry it up.
  • First scheduled run (next 04:00 UTC after main merge) should report planned: 0 since today's cleanup left RfP / FR-on-staging empty.
  • Trigger one manual run with dry-run: true from the Actions tab to confirm classification logic before relying on auto-apply.
  • Once confirmed working, drop a .kanban.yml into averaging-service to lock in the staging: Prod convention.

Follow-ups (separate PR)

  • Layer 2 — stuck-in-column nudge bot (weekly Monday Slack report on items idle past per-column thresholds).
  • Once .kanban.yml is in averaging-service, the next staging push will auto-advance items to Prod and the reconciler becomes a backstop, not the primary path.

🤖 Generated with Claude Code

Two enhancements to harden the kanban automation against silent stalls:

1. **kanban-reconcile.yml** (new) — nightly cron at 04:00 UTC scans the kanban
   for non-terminal columns and self-heals four classes of drift:
   - Merged PRs whose SHA is on the repo's prod branch → Prod
   - Merged release-vehicle PRs ("Release: …", "Dev to staging", "Staging to prod") → Prod
   - Closed-not-merged PRs in any non-terminal column → Cancelled
   - Open issues sitting in post-merge columns → Backlog

   Safety cap: aborts if a single run would move more than 100 items (treats
   that as a bug, not real drift). Supports `dry-run` via workflow_dispatch.

2. **advance-deploy-env.yml** — added `.kanban.yml` per-repo override. Repos
   that deploy out-of-band (e.g. averaging-service ships Docker images from
   staging, no main push) can declare:

     # .kanban.yml
     branch_status_map:
       staging: Prod

   The override merges with the default mapping (develop→FR-on-dev,
   staging→FR-on-staging, master|main→Prod). This was the root cause of
   19 averaging-service items getting stuck in Ready-for-prod today —
   main was never pushed because the deploy convention doesn't require it.

Catches everything we hand-cleaned today (≈109 items). Layer 2 (stuck-in-
column nudge bot) coming as a follow-up PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LukasWodka
Copy link
Copy Markdown
Contributor Author

👋 Heads-up — Code review queue is at 12 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@LukasWodka LukasWodka merged commit da295b8 into develop Jun 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants