Claim hot-path slows ~33% with deadline rescue enabled at high concurrency

## Context

At 1×256 worker concurrency, depth-target driving `awa.queue_lanes` near saturation, **enabling per-claim deadline rescue costs ~33% steady-state throughput and ~50% p99 latency** on the claim hot path. The cost is distributed across the whole claim path rather than concentrated in a single rescue-specific query, which suggests a structural rather than additive overhead.

Reproduction shape: `awa-bench` long_horizon scenario, single replica, 256 workers, depth-target = 4000, `JOB_WORK_MS=1`, `producer-mode=depth-target`, 30 s warmup + 120 s clean. Two cells, identical except `LEASE_DEADLINE_MS`:

| | rescue OFF (`LEASE_DEADLINE_MS=0`) | rescue ON (library default) | Δ |
|---|--:|--:|--:|
| `completion_rate` | 5,419 jobs/s | 3,649 jobs/s | **−33 %** |
| offered (depth-target burst) | 3,905 jobs/s | 2,752 jobs/s | −30 % |
| `end_to_end_p99_ms` | 100 ms | 151 ms | +51 % |

## What `pg_stat_statements` says

`pg_stat_statements_reset()` immediately before each cell, snapshot at +90 s into the clean phase. Top hot-path queries (`mean_exec_time`, ms):

| query | OFF mean | ON mean | Δ |
|---|--:|--:|--:|
| `UPDATE awa.queue_enqueue_heads SET next_seq = next_seq + $3 ...` | 14.8 | 18.0 | +22 % |
| `INSERT INTO awa.queue_enqueue_heads ON CONFLICT DO NOTHING` | 5.6 | 7.3 | +30 % |
| `SELECT ... FROM awa.claim_ready_runtime(...)` | 1.28 | 1.60 | +25 % |
| `INSERT INTO awa.queue_lanes ON CONFLICT DO NOTHING` | 2.9 | 3.3 | +14 % |
| `INSERT INTO awa.queue_claim_heads ON CONFLICT DO NOTHING` | 2.8 | 3.3 | +18 % |
| `UPDATE awa.queue_lanes ... deltas ...` | 2.2 | 2.5 | +13 % |
| `UPDATE awa.queue_lanes SET available_count = ...` | 1.7 | 2.0 | +18 % |

Two things this **doesn't** show:

1. **No new query in the top-50 with rescue ON.** If the rescue scanner were a periodic standalone scan firing visibly, it should appear. It doesn't — either it's inside the regular claim path (e.g. force-close happens inline in `claim_ready_runtime`), or it's batched at a low enough cadence that even with 33 % throughput cost it stays below the noise floor.
2. **The cost is uniform, not localized.** Every hot query is 13–30 % slower with rescue ON. That's the fingerprint of contention on a shared resource (lock contention, buffer-cache pressure, index walk cost), not "the rescue path itself adds N ms per claim."

The body of `claim_ready_runtime` documents the mechanism in a comment:

> `deadline_at` is the per-claim deadline when the queue has a non-zero `deadline_duration`; the deadline-rescue path scans expired rows (anti-joined with closures and leases — same disambiguation as the heartbeat-rescue path) and force-closes them.

## Hypothesis

The most consistent explanation for **uniform per-query slowdown without a visible new query** is:

> **`awa.lease_claims` working set inflates when rescue is on, and every other query that touches `lease_claims` (every claim, every completion) pays the cost in index lookup time / page fetches / lock contention.**

That'd happen if claims awaiting force-close linger in `lease_claims` longer than they would when `deadline_at IS NULL` (where the table only ever sees a row for the duration of an in-flight job).

Equally consistent secondary hypothesis: the rescue scanner is doing a sequential scan or partial-index miss across `lease_claims`, holding short locks that serialize against the regular claim INSERT/UPDATE path.

## Proposed experiments

In rough order of cheap → less cheap:

1. **`pg_stat_user_tables.n_live_tup` snapshot of `awa.lease_claims` at end-of-clean in both cells.** If rescue ON shows materially more rows than rescue OFF at steady state, the working-set inflation hypothesis is confirmed. ~5 minutes of work.
2. **Add `awa.lease_claims (deadline_at) WHERE deadline_at IS NOT NULL` partial index** and re-run rescue ON. If the per-claim latency tax drops, the rescue scanner is doing an avoidable seqscan / full-index walk.
3. **Tune the rescue scanner's batch size / wake interval** (or, if it's currently inline in `claim_ready_runtime`, hoist it out to a background task with its own cadence). Look for the knee where the scanner reaps fast enough to keep `lease_claims` small but doesn't dominate the claim path.
4. **Flame graph of awa-bench at 1×256 with rescue on vs off.** Would directly confirm whether the time goes to PG roundtrips, lock waits, or in-process work in the awa-worker claim loop.

The first two are non-invasive and should be enough to localize the cost. (3) and (4) are needed if the first two come back ambiguous.

## Safety considerations

- **Don't disable `deadline_at` writes by default to fix this.** Per-claim deadline rescue is the documented fallback path for stuck workers under correctness invariants the chaos suite relies on; the wrong fix would be making the bench look better at the cost of a real reliability mechanism.
- **The fix should not change the visible behavior of force-close.** If a job's deadline expires, it must still be force-closed and re-eligible for claim. Both inline (current) and background (proposed) implementations need to preserve this.
- **The partial-index proposal needs a migration plan.** `awa.lease_claims` is partitioned in current schema; the partial index needs to be defined per-partition and on the master.
- **If the working-set hypothesis is right, there's an upstream knock-on for chaos scenarios.** A larger steady-state `lease_claims` table also means slower force-close on a true wedge, because the rescue scan walks more rows. Worth measuring how long it takes a wedged claim to be force-closed at high concurrency before merging any tuning that lets the table grow.

## Reproducer

`awa-bench` adapter from the [postgresql-job-queue-benchmarking](https://github.com/hardbyte/postgresql-job-queue-benchmarking) repo at branch `bench/2026-05-07-awa-alpha6-pgque-rc1`:

```sh
docker compose up -d --wait
LEASE_DEADLINE_MS=0 \
  uv run bench run --systems awa --replicas 1 \
    --producer-rate 50000 --producer-mode depth-target --target-depth 4000 \
    --worker-count 256 \
    --phase warmup=warmup:30s --phase clean=clean:120s

# repeat without LEASE_DEADLINE_MS to use library default
```

Tested against awa `0.6.0-alpha.6` and `0.6.0-alpha.7`; both reproduce the same shape.

`pg_stat_statements` snapshots are committed under `results/2026-05-08-rescue-perf-probe/snapshots/` for direct inspection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claim hot-path slows ~33% with deadline rescue enabled at high concurrency #246

Context

What `pg_stat_statements` says

Hypothesis

Proposed experiments

Safety considerations

Reproducer

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	rescue OFF (`LEASE_DEADLINE_MS=0`)	rescue ON (library default)	Δ
`completion_rate`	5,419 jobs/s	3,649 jobs/s	−33 %
offered (depth-target burst)	3,905 jobs/s	2,752 jobs/s	−30 %
`end_to_end_p99_ms`	100 ms	151 ms	+51 %

query	OFF mean	ON mean	Δ
`UPDATE awa.queue_enqueue_heads SET next_seq = next_seq + $3 ...`	14.8	18.0	+22 %
`INSERT INTO awa.queue_enqueue_heads ON CONFLICT DO NOTHING`	5.6	7.3	+30 %
`SELECT ... FROM awa.claim_ready_runtime(...)`	1.28	1.60	+25 %
`INSERT INTO awa.queue_lanes ON CONFLICT DO NOTHING`	2.9	3.3	+14 %
`INSERT INTO awa.queue_claim_heads ON CONFLICT DO NOTHING`	2.8	3.3	+18 %
`UPDATE awa.queue_lanes ... deltas ...`	2.2	2.5	+13 %
`UPDATE awa.queue_lanes SET available_count = ...`	1.7	2.0	+18 %

Claim hot-path slows ~33% with deadline rescue enabled at high concurrency #246

Description

Context

What pg_stat_statements says

Hypothesis

Proposed experiments

Safety considerations

Reproducer

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

What `pg_stat_statements` says