perf(queue_storage): live terminal counter for queue_counts_exact

## Context

PR #289 added `QueueStorage::queue_counts_fast` — an index-only depth probe for high-cadence pollers (admin dashboards, depth-target producers, soak observability). It is *not* a replacement for `queue_counts_exact`: it under-counts `completed` while the live `done_entries` segment hasn't rolled up, and the exact path is still what admin tooling reaches for when the live terminal segment matters.

`queue_counts_exact` itself stays slow at deep backlog. Two CTEs dominate:

- `live_terminal`: `count(*) FROM done_entries WHERE queue = $1`. The per-slot `done_entries` child tables have a composite index leading with `(queue, priority, enqueue_shard, lane_seq)`, so a queue-only filter can in principle index-only-scan — but that still has to count every matching terminal row, and append-heavy recent rows aren't reliably index-only until vacuum marks pages all-visible.
- `live_running`'s `lease_claims × lease_claim_closures` anti-join — separate problem, not in scope here.

A naive `done_entries(queue)` index would mostly duplicate the leading columns of the existing composite index. It might cut buffer reads, but it's still O(live terminal rows) per call.

The better shape is **an exact rollup that makes `queue_counts_exact` O(queue slots × shards), not O(done rows)**.

## Proposal

Add a live terminal counter alongside `queue_terminal_rollups.pruned_completed_count`, keyed naturally on the same dimensions the hot path already partitions on:

| Counter | Key shape |
|---|---|
| `queue_terminal_live_counts.completed` | `(ready_slot, queue, priority, enqueue_shard)` |

(Or `(ready_slot, queue, priority)` if `enqueue_shard` granularity isn't worth the extra rows.)

**Write paths to update**:

- `insert_done_rows_tx` — group by `(ready_slot, queue, priority, enqueue_shard)` and `INSERT … ON CONFLICT … DO UPDATE SET completed = completed + EXCLUDED.completed` once per group per batch.
- Terminal row removal: `retry_job_tx` (terminal → retryable), `move_failed_to_dlq` (terminal → DLQ), any other terminal-deleting path. Decrement by the moved count.

**Prune path**: when a `ready_slot` is pruned, fold its live counters into `queue_terminal_rollups.pruned_completed_count` for that `(queue, priority)` (sum or max, matching the existing aggregator), then `DELETE FROM queue_terminal_live_counts WHERE ready_slot = $slot`.

**Exact query becomes**:

```sql
exact_completed =
    sum(queue_terminal_rollups.pruned_completed_count) +
    sum(queue_terminal_live_counts.completed)
WHERE queue = ANY($1)
```

Both are bounded by `num_ready_slots × num_priority × num_enqueue_shards`. No `done_entries` scan.

## Risks

- **Write-path contention**: the new counter is hit by every `insert_done_rows_tx` batch. Keep it naturally sharded by `(ready_slot, queue, priority, enqueue_shard)` rather than a single hot `(queue)` row to avoid serialising completions.
- **WAL / throughput regression**: each batch grows by N upserts (one per group). Benchmark before merging. Worst-case skew is single-queue single-priority single-shard workloads where every batch hits the same row — measure that explicitly.
- **Correctness on counted-removal paths**: `retry_job_tx` and DLQ move have to decrement the *correct* `(ready_slot, queue, priority, enqueue_shard)` group, which means they need to read the row being moved before decrementing. Verify by hand against each terminal-removal call site.

## Out of scope here

- The `lease_claims × lease_claim_closures` anti-join cost in `live_running`. Still slow at high open-claim counts in receipt-plane modes. Worth a separate ticket.
- Bench harness polling cadence (separate small change to switch the long-horizon depth poller off the 200 ms tight loop).
- The "could we just add `done_entries(queue)` and move on?" branch — worth measuring before discarding, but the bet is that it cuts I/O but not the row count and so doesn't pay back the index maintenance cost. PR #289 ships the depth-probe API regardless.

## Empirical first step (optional)

Before the denormalisation lands, a narrow regression bench against the current composite index — `count(*) FROM done_entries_$slot WHERE queue = $1` per slot, summed in the harness — would calibrate how much of the gap a queue-only index could close. If that's already within target latency for admin UI, the denormalisation may be overkill. The expectation is that it isn't.

Refs: #289, ADR-019, ADR-026 (narrow terminal history).

## Acceptance criteria

- `queue_counts_exact.completed` no longer scans `done_entries` — the
  query reads only `queue_terminal_rollups.pruned_completed_count` and
  the new `queue_terminal_live_counts` table, bounded by
  `num_ready_slots × num_priority × num_enqueue_shards`.
- Every code path that adds or removes a terminal row keeps the
  counters correct: completion (`insert_done_rows_tx`), retry-from-
  terminal (`retry_job_tx`), DLQ move (`move_failed_to_dlq` and the
  DLQ-on-fail completion branches), discard, and prune (folds slot
  live counts into `pruned_completed_count` then clears).
- Tests cover increment (completion), decrement (retry-from-terminal,
  DLQ move, discard), prune-fold (live + rollup converge), and
  recovery/rebuild from `done_entries` if such a path is added.
- Benchmark includes the skewed worst case — one queue, one priority,
  one enqueue shard, high completion rate — so the write-path
  contention on the resulting hot counter row is measured before
  merging. Compare to the current `queue_counts_exact` cost at the
  same workload to confirm the trade is net positive at the relevant
  backlog sizes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(queue_storage): live terminal counter for queue_counts_exact #290

Context

Proposal

Risks

Out of scope here

Empirical first step (optional)

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf(queue_storage): live terminal counter for queue_counts_exact #290

Description

Context

Proposal

Risks

Out of scope here

Empirical first step (optional)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions