Context
At 1×256 worker concurrency, depth-target driving awa.queue_lanes near saturation, enabling per-claim deadline rescue costs ~33% steady-state throughput and ~50% p99 latency on the claim hot path. The cost is distributed across the whole claim path rather than concentrated in a single rescue-specific query, which suggests a structural rather than additive overhead.
Reproduction shape: awa-bench long_horizon scenario, single replica, 256 workers, depth-target = 4000, JOB_WORK_MS=1, producer-mode=depth-target, 30 s warmup + 120 s clean. Two cells, identical except LEASE_DEADLINE_MS:
|
rescue OFF (LEASE_DEADLINE_MS=0) |
rescue ON (library default) |
Δ |
completion_rate |
5,419 jobs/s |
3,649 jobs/s |
−33 % |
| offered (depth-target burst) |
3,905 jobs/s |
2,752 jobs/s |
−30 % |
end_to_end_p99_ms |
100 ms |
151 ms |
+51 % |
What pg_stat_statements says
pg_stat_statements_reset() immediately before each cell, snapshot at +90 s into the clean phase. Top hot-path queries (mean_exec_time, ms):
| query |
OFF mean |
ON mean |
Δ |
UPDATE awa.queue_enqueue_heads SET next_seq = next_seq + $3 ... |
14.8 |
18.0 |
+22 % |
INSERT INTO awa.queue_enqueue_heads ON CONFLICT DO NOTHING |
5.6 |
7.3 |
+30 % |
SELECT ... FROM awa.claim_ready_runtime(...) |
1.28 |
1.60 |
+25 % |
INSERT INTO awa.queue_lanes ON CONFLICT DO NOTHING |
2.9 |
3.3 |
+14 % |
INSERT INTO awa.queue_claim_heads ON CONFLICT DO NOTHING |
2.8 |
3.3 |
+18 % |
UPDATE awa.queue_lanes ... deltas ... |
2.2 |
2.5 |
+13 % |
UPDATE awa.queue_lanes SET available_count = ... |
1.7 |
2.0 |
+18 % |
Two things this doesn't show:
- No new query in the top-50 with rescue ON. If the rescue scanner were a periodic standalone scan firing visibly, it should appear. It doesn't — either it's inside the regular claim path (e.g. force-close happens inline in
claim_ready_runtime), or it's batched at a low enough cadence that even with 33 % throughput cost it stays below the noise floor.
- The cost is uniform, not localized. Every hot query is 13–30 % slower with rescue ON. That's the fingerprint of contention on a shared resource (lock contention, buffer-cache pressure, index walk cost), not "the rescue path itself adds N ms per claim."
The body of claim_ready_runtime documents the mechanism in a comment:
deadline_at is the per-claim deadline when the queue has a non-zero deadline_duration; the deadline-rescue path scans expired rows (anti-joined with closures and leases — same disambiguation as the heartbeat-rescue path) and force-closes them.
Hypothesis
The most consistent explanation for uniform per-query slowdown without a visible new query is:
awa.lease_claims working set inflates when rescue is on, and every other query that touches lease_claims (every claim, every completion) pays the cost in index lookup time / page fetches / lock contention.
That'd happen if claims awaiting force-close linger in lease_claims longer than they would when deadline_at IS NULL (where the table only ever sees a row for the duration of an in-flight job).
Equally consistent secondary hypothesis: the rescue scanner is doing a sequential scan or partial-index miss across lease_claims, holding short locks that serialize against the regular claim INSERT/UPDATE path.
Proposed experiments
In rough order of cheap → less cheap:
pg_stat_user_tables.n_live_tup snapshot of awa.lease_claims at end-of-clean in both cells. If rescue ON shows materially more rows than rescue OFF at steady state, the working-set inflation hypothesis is confirmed. ~5 minutes of work.
- Add
awa.lease_claims (deadline_at) WHERE deadline_at IS NOT NULL partial index and re-run rescue ON. If the per-claim latency tax drops, the rescue scanner is doing an avoidable seqscan / full-index walk.
- Tune the rescue scanner's batch size / wake interval (or, if it's currently inline in
claim_ready_runtime, hoist it out to a background task with its own cadence). Look for the knee where the scanner reaps fast enough to keep lease_claims small but doesn't dominate the claim path.
- Flame graph of awa-bench at 1×256 with rescue on vs off. Would directly confirm whether the time goes to PG roundtrips, lock waits, or in-process work in the awa-worker claim loop.
The first two are non-invasive and should be enough to localize the cost. (3) and (4) are needed if the first two come back ambiguous.
Safety considerations
- Don't disable
deadline_at writes by default to fix this. Per-claim deadline rescue is the documented fallback path for stuck workers under correctness invariants the chaos suite relies on; the wrong fix would be making the bench look better at the cost of a real reliability mechanism.
- The fix should not change the visible behavior of force-close. If a job's deadline expires, it must still be force-closed and re-eligible for claim. Both inline (current) and background (proposed) implementations need to preserve this.
- The partial-index proposal needs a migration plan.
awa.lease_claims is partitioned in current schema; the partial index needs to be defined per-partition and on the master.
- If the working-set hypothesis is right, there's an upstream knock-on for chaos scenarios. A larger steady-state
lease_claims table also means slower force-close on a true wedge, because the rescue scan walks more rows. Worth measuring how long it takes a wedged claim to be force-closed at high concurrency before merging any tuning that lets the table grow.
Reproducer
awa-bench adapter from the postgresql-job-queue-benchmarking repo at branch bench/2026-05-07-awa-alpha6-pgque-rc1:
docker compose up -d --wait
LEASE_DEADLINE_MS=0 \
uv run bench run --systems awa --replicas 1 \
--producer-rate 50000 --producer-mode depth-target --target-depth 4000 \
--worker-count 256 \
--phase warmup=warmup:30s --phase clean=clean:120s
# repeat without LEASE_DEADLINE_MS to use library default
Tested against awa 0.6.0-alpha.6 and 0.6.0-alpha.7; both reproduce the same shape.
pg_stat_statements snapshots are committed under results/2026-05-08-rescue-perf-probe/snapshots/ for direct inspection.
Context
At 1×256 worker concurrency, depth-target driving
awa.queue_lanesnear saturation, enabling per-claim deadline rescue costs ~33% steady-state throughput and ~50% p99 latency on the claim hot path. The cost is distributed across the whole claim path rather than concentrated in a single rescue-specific query, which suggests a structural rather than additive overhead.Reproduction shape:
awa-benchlong_horizon scenario, single replica, 256 workers, depth-target = 4000,JOB_WORK_MS=1,producer-mode=depth-target, 30 s warmup + 120 s clean. Two cells, identical exceptLEASE_DEADLINE_MS:LEASE_DEADLINE_MS=0)completion_rateend_to_end_p99_msWhat
pg_stat_statementssayspg_stat_statements_reset()immediately before each cell, snapshot at +90 s into the clean phase. Top hot-path queries (mean_exec_time, ms):UPDATE awa.queue_enqueue_heads SET next_seq = next_seq + $3 ...INSERT INTO awa.queue_enqueue_heads ON CONFLICT DO NOTHINGSELECT ... FROM awa.claim_ready_runtime(...)INSERT INTO awa.queue_lanes ON CONFLICT DO NOTHINGINSERT INTO awa.queue_claim_heads ON CONFLICT DO NOTHINGUPDATE awa.queue_lanes ... deltas ...UPDATE awa.queue_lanes SET available_count = ...Two things this doesn't show:
claim_ready_runtime), or it's batched at a low enough cadence that even with 33 % throughput cost it stays below the noise floor.The body of
claim_ready_runtimedocuments the mechanism in a comment:Hypothesis
The most consistent explanation for uniform per-query slowdown without a visible new query is:
That'd happen if claims awaiting force-close linger in
lease_claimslonger than they would whendeadline_at IS NULL(where the table only ever sees a row for the duration of an in-flight job).Equally consistent secondary hypothesis: the rescue scanner is doing a sequential scan or partial-index miss across
lease_claims, holding short locks that serialize against the regular claim INSERT/UPDATE path.Proposed experiments
In rough order of cheap → less cheap:
pg_stat_user_tables.n_live_tupsnapshot ofawa.lease_claimsat end-of-clean in both cells. If rescue ON shows materially more rows than rescue OFF at steady state, the working-set inflation hypothesis is confirmed. ~5 minutes of work.awa.lease_claims (deadline_at) WHERE deadline_at IS NOT NULLpartial index and re-run rescue ON. If the per-claim latency tax drops, the rescue scanner is doing an avoidable seqscan / full-index walk.claim_ready_runtime, hoist it out to a background task with its own cadence). Look for the knee where the scanner reaps fast enough to keeplease_claimssmall but doesn't dominate the claim path.The first two are non-invasive and should be enough to localize the cost. (3) and (4) are needed if the first two come back ambiguous.
Safety considerations
deadline_atwrites by default to fix this. Per-claim deadline rescue is the documented fallback path for stuck workers under correctness invariants the chaos suite relies on; the wrong fix would be making the bench look better at the cost of a real reliability mechanism.awa.lease_claimsis partitioned in current schema; the partial index needs to be defined per-partition and on the master.lease_claimstable also means slower force-close on a true wedge, because the rescue scan walks more rows. Worth measuring how long it takes a wedged claim to be force-closed at high concurrency before merging any tuning that lets the table grow.Reproducer
awa-benchadapter from the postgresql-job-queue-benchmarking repo at branchbench/2026-05-07-awa-alpha6-pgque-rc1:docker compose up -d --wait LEASE_DEADLINE_MS=0 \ uv run bench run --systems awa --replicas 1 \ --producer-rate 50000 --producer-mode depth-target --target-depth 4000 \ --worker-count 256 \ --phase warmup=warmup:30s --phase clean=clean:120s # repeat without LEASE_DEADLINE_MS to use library defaultTested against awa
0.6.0-alpha.6and0.6.0-alpha.7; both reproduce the same shape.pg_stat_statementssnapshots are committed underresults/2026-05-08-rescue-perf-probe/snapshots/for direct inspection.