fix(routing): feed least_busy in-flight counts on /v1/messages and /v1/responses#716
Conversation
…1/responses begin_in_flight was only called from chat.rs (#684 left the passthrough endpoints as a follow-up), so a least_busy group serving Anthropic-SDK or Codex traffic saw all-zero counts and the stable sort silently degraded to declaration order — failover in disguise, same silent-degradation class as AISIX-Cloud#954. Mirror both chat.rs patterns across the four dispatch functions (messages: anthropic_passthrough_dispatch + cross_provider_dispatch; responses: responses_to_target + responses_cross_provider_to_target): non-streaming / error paths hold the guard for the attempt scope, the streaming branches move it into the end-of-stream closure next to stream_hold so the count stays raised for the stream's full lifetime. count_tokens stays deliberately excluded, matching least_latency. The new e2e (occupy the declared-first slow target, assert the next request diverts to the idle one, attribution via per-upstream request counters) fails on the pre-fix binary for both endpoints and passes after. Fixes #715
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughExtends ChangesLeast-busy in-flight tracking fix
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/e2e/src/cases/least-busy-passthrough-endpoints-e2e.test.ts (1)
137-239: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd
stream:truecases for the stream-lifetime guard paths.These tests only exercise non-streaming requests, but the PR also changes streaming closures to hold
in_flightuntil end-of-stream. Add streaming variants for/v1/messagesand/v1/responsesso a regression that drops the guard at handler return is caught. As per coding guidelines, “Tests must cover boundary cases … combination scenarios, and extreme cases.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/src/cases/least-busy-passthrough-endpoints-e2e.test.ts` around lines 137 - 239, Add streaming coverage for the least_busy passthrough guard paths in the existing `/v1/messages` and `/v1/responses` e2e tests. The current cases only validate non-streaming requests, so extend the same test block around `postMessages`, `postResponses`, and the `least_busy` virtual models to send `stream:true` requests and assert routing still diverts away from the in-flight upstream until the stream ends. Keep the existing slow/fast upstream setup and request counter checks so a regression that releases `in_flight` at handler return is caught for both stream-lifetime paths.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/src/cases/least-busy-passthrough-endpoints-e2e.test.ts`:
- Around line 174-180: The least-busy passthrough test is using a fixed sleep
after starting the occupy request, which can race before the upstream has
actually incremented in_flight. Update the affected test cases in
least-busy-passthrough-endpoints-e2e.test.ts to wait by polling the upstream
request count/state until the occupy request is observed, then send the
diversion request. Use the existing postMessages flow and the relevant
upstream/request-count helpers in the test to make the wait deterministic
instead of time-based.
---
Nitpick comments:
In `@tests/e2e/src/cases/least-busy-passthrough-endpoints-e2e.test.ts`:
- Around line 137-239: Add streaming coverage for the least_busy passthrough
guard paths in the existing `/v1/messages` and `/v1/responses` e2e tests. The
current cases only validate non-streaming requests, so extend the same test
block around `postMessages`, `postResponses`, and the `least_busy` virtual
models to send `stream:true` requests and assert routing still diverts away from
the in-flight upstream until the stream ends. Keep the existing slow/fast
upstream setup and request counter checks so a regression that releases
`in_flight` at handler return is caught for both stream-lifetime paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1e1c3165-6e0c-408e-945d-33aff46898d4
📒 Files selected for processing (3)
crates/aisix-proxy/src/messages.rscrates/aisix-proxy/src/responses.rstests/e2e/src/cases/least-busy-passthrough-endpoints-e2e.test.ts
CodeRabbit: a fixed 200ms delay can race on slow CI — poll the slow upstream's request count instead. Arrival implies the in-flight guard is already held (acquired before the upstream send).
Problem
least_busyonly worked on/v1/chat/completions.begin_in_flighthad exactly two call sites, both in chat.rs (#684 wired chat only and left the passthrough endpoints as a follow-up that never landed). Traffic on/v1/messages(Claude Code / Anthropic SDK) or/v1/responses(Codex) never raised the in-flight count, so every target tied at 0 and the stable sort preserved declaration order — the strategy silently degraded to failover, with the first target absorbing all concurrency. Same silent-degradation class as AISIX-Cloud#954. The inverse also held: load arriving via the passthrough endpoints was invisible to chat-side least_busy decisions.Fix
Mirror the two existing chat.rs patterns across the four passthrough dispatch functions (messages:
anthropic_passthrough_dispatch+cross_provider_dispatch; responses:responses_to_target+responses_cross_provider_to_target):InFlightGuardfor the attempt scope (acquired right before the upstream send, mirroring chat.rs:1784);drop(stream_hold), so the count stays raised for the stream's full lifetime (mirroring chat.rs:1228)./v1/messages/count_tokensstays deliberately excluded, matching the least_latency EWMA scope.Tests
New
least-busy-passthrough-endpoints-e2e.test.ts: occupy the declared-first slow target with an un-awaited request, assert the next request diverts to the idle target; attribution via per-upstreamreceivedRequestscounters. Fails on the pre-fix binary for both endpoints (messages: diverted request served by the busy target; responses: slow upstream took 2 requests) and passes after the fix. Existing least-busy/least-latency/model-group/responses e2e suites pass unchanged (8/8),cargo test -p aisix-proxy582 passed, fmt + clippy clean.Fixes #715
Summary by CodeRabbit
least_busyrouting to keep upstreams marked as busy for the full duration of streaming passthrough, preventing premature “idle” reassignment./v1/messagesand/v1/responsesstreaming paths.least_busypassthrough endpoints using slow vs. fast upstreams, asserting diverted requests reach the idle target and that per-upstream request counts update as expected.