You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposal: Ping-aware Streaming Watchdog in SDK (fixes #749, #867, and upstream Claude Code hangs)
Thesis
The client should adapt to the server and current load — not guess. And the adapting should live in the SDK, not in every application built on top of it.
Every reliability problem in this family (#749, #867, #46987, #33949, #39755, #25979) traces back to the same two architectural mistakes:
Clients are forced to guess. They pick fixed timeout numbers with zero information about what the server is actually doing. Those numbers are always wrong — too aggressive for Opus thinking, too lax for real hangs, blind to queues, blind to outages, and scattered across every SDK and every client reimplementing the same heuristics from scratch.
Reliability logic lives in the wrong layer. Today the public, MIT-licensed SDK is a bare-bones Stainless-generated HTTP wrapper with zero idle-timeout, retry, fallback, or ping handling. Meanwhile Claude Code's closed-source 13 MB cli.js re-implements all of those primitives from scratch — retry wrappers, watchdogs, fallback paths, silent 529 recovery, partial-yield throws, telemetry events, kill switches. Every fix has to happen twice, in two places that don't share code, one of which is unreviewable and ships multiple minified versions per day. The v2.1.104 partial response received regression exists because this duplication made it easy to ship a user-hostile change in cli.js without anyone being able to review it.
The SDK should own every reliability primitive. That means: streaming with built-in idle-timeout, ping-aware watchdog, retry with backoff, automatic fallback to non-streaming, 529 handling, request-id correlation, rate-limit awareness, graceful abort propagation. All MIT-licensed, all auditable, all shared across every language SDK and every downstream application.
Claude Code (and every other client) should contain pure business logic. Tools, permissions, prompt construction, UI rendering, agent coordination, session storage, slash commands. None of them should touch HTTP retry loops or setTimeout-based watchdogs ever again. If Claude Code calls sdk.messages.stream() and it takes too long, that is the SDK's problem to detect, explain, and recover from.
Clients can override individual behaviors where genuinely necessary — but the contracts live in the SDK. This is not "SDK is a black box you can't touch." It's "SDK defines every reliability primitive as an extensible interface; clients plug in different implementations of specific hooks without forking the contract." A concrete shape:
client.messages.stream(params,{// opt-in behaviors, strongly-typed, all optionalidleTimeout: {respectServerBudget: true,safetyFactor: 1.5},retry: {maxAttempts: 3,backoff: 'exponential'},fallback: {onWatchdogAbort: 'non-streaming'},// override hooks with typed interfaces — not string-matched reimplementationsonRateLimited: (info)=>{/* custom UI */},onStreamStalled: (ctx)=>ctx.retry({strategy: 'cached'}),onPing: (ping)=>{/* custom telemetry */},// replace whole primitives via DI — contract stays the SDK'sretryStrategy: customRetryStrategy,// implements interface RetryStrategyfallbackStrategy: customFallback,// implements interface FallbackStrategy})
The principle: clients can replace the implementation of any reliability primitive, but they cannot invent new contracts. RetryStrategy, FallbackStrategy, WatchdogPolicy, RateLimitHandler — all defined in the SDK as public interfaces, with sensible defaults. A client that needs something unusual (telemetry injection, custom UI during retry, organization-specific backoff) swaps the implementation. It does not rewrite for await loops and setTimeout watchdogs from scratch.
This matters because today Claude Code has already fragmented the contract. Claude Code's watchdog, fallback, retry, rate-limit handling — none of them have interfaces anyone else can see, let alone reuse. The moment any third-party application wants the same reliability guarantees as Claude Code, they have to start from zero and read 13 MB of minified JavaScript. That is why the same class of bugs keeps resurfacing across the ecosystem.
The server knows what it's doing. The server knows its current load. The server knows how long the next ping will take. Stop making the client guess, and stop making every application built on top of Anthropic's API reinvent the wheel poorly.
TL;DR
Anthropic's SSE stream emits ping events every ~15-30s as a liveness signal, but the TypeScript SDK silently drops them (if (event === 'ping') continue). This prevents any downstream consumer — including Claude Code CLI — from distinguishing "server is actively thinking" from "stream is silently stalled".
Result: every reliability watchdog built on top of the SDK suffers false-positive aborts during long model thinking (Opus with high effort can go silent for 60-120s between text deltas), and real hangs are still indistinguishable from normal operation.
Proposed fix, in four evolutionary steps (each strictly better, each backwards-compatible with the previous):
Adaptive client thresholds — different timeouts per state instead of one global magic number.
Server-driven thresholds — server tells the client nextPingWithinMs, client just respects it. No hardcoded numbers anywhere.
Step 1 alone fixes #749 (API completeness), #867 (SDK-level streaming idle timeout), and the upstream regression flooding Claude Code's #46987 (40+ reports in 24h) caused by the v2.1.104 partial response received error path. Steps 2-4 eliminate the class of problem entirely.
forawait(constsseofiterator){// ... parse sse.data into Item and yieldif(sse.event==='ping'){continue;// ← dropped, never yielded}if(sse.event==='error'){thrownewAPIError(undefined,safeJSON(sse.data)??sse.data,undefined,response.headers);}}
Pings are silently consumed. Errors are surfaced as thrown APIError, which is correct behavior, but neither event ever reaches the consumer as a streamed event object — so downstream code can't tell when the last proof-of-life arrived.
"I think it's still valuable to receive the ping events in the async iterator, it's useful for detecting if the connection is still alive during a long tool call, and distinguish from the API hanging, and not emitting more tokens."
The issue has been open for 11 months with no action.
2. Claude Code hits this exact problem every day
Claude Code implements its own streaming watchdog in closed-source cli.js (env: CLAUDE_ENABLE_STREAM_WATCHDOG=1). Because it consumes the SDK's async iterator, it cannot see ping events. The watchdog resets only on yielded events (content deltas), so any thinking-silent period > CLAUDE_STREAM_IDLE_TIMEOUT_MS (default 90s) triggers an abort.
forawait(leteventofstream){resetStreamIdleTimer()// resets only on actual content events// ... process event}
Before v2.1.104 the abort silently fell back to non-streaming mode (users wasted 2× tokens but didn't see errors). In v2.1.104 the fallback was removed when partial data has been received, replaced with:
Users on Opus + extended thinking are getting repeated API Error: Stream idle timeout - partial response received on retry, consistently unrecoverable without a process restart. Community root-cause (@gabrimatic):
"The culprit is CLAUDE_ENABLE_STREAM_WATCHDOG=1. It kills the stream after 90 seconds of no data — but with Opus and high effort thinking, the model can easily go silent for longer than that while reasoning."
Workarounds suggested in the thread:
Disable the watchdog → lose all hang protection
Raise CLAUDE_STREAM_IDLE_TIMEOUT_MS to 300-600s → catch hangs 5-10× slower
Both are band-aids. Neither addresses the root cause.
4. Our own logs confirm the false-positive pattern
From a naga debugging session (2026-04-13, Claude Code v2.1.104):
12:22:47 [WARN] Streaming idle warning: no chunks received for 45s
12:23:32 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:23:32 [ERROR] Error in API request: Stream idle timeout - partial response received
12:25:21 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:25:21 [ERROR] Error in API request: Stream idle timeout - partial response received
12:30:12 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:30:12 [ERROR] Error in API request: Stream idle timeout - partial response received
Three consecutive retries, three different x-client-request-id, all abort at exactly 90s into streaming. Process restart with an otherwise-identical conversation resumed streaming normally. The server was sending pings the whole time — we just couldn't see them.
5. #867 independently reached the same conclusion via a different path
@notactuallytreyanastasio documented a real Opus 4.5 hang where the thinking block completed with output_tokens: 4, stop_reason: null, and the stream never delivered the next event. Proposed adding an SDK-level streaming idle timeout.
Problem with a plain idle timeout: it hits the same false-positive wall as Claude Code's watchdog. Without ping awareness, you can't distinguish "thinking" from "hung" — you can only pick a threshold and hope.
Proposal
Change in @anthropic-ai/sdk (src/core/streaming.ts)
// Before (around lines 78-84)if(sse.event==='ping'){continue;}if(sse.event==='error'){thrownewAPIError(undefined,safeJSON(sse.data)??sse.data,undefined,response.headers);}// After — yield ping as a typed event so consumers can reset idle timers.// Error behavior is preserved (still thrown) because errors already have// a proper APIError path; we just make pings observable.if(sse.event==='ping'){yield{type: 'ping',timestamp: Date.now()}asItem;continue;}if(sse.event==='error'){thrownewAPIError(undefined,safeJSON(sse.data)??sse.data,undefined,response.headers);}
Add to RawMessageStreamEvent union:
exporttypePingEvent={type: 'ping';timestamp: number}exporttypeStreamErrorEvent={type: 'error';error: APIError}exporttypeRawMessageStreamEvent=|MessageStartEvent|MessageDeltaEvent|MessageStopEvent|ContentBlockStartEvent|ContentBlockDeltaEvent|ContentBlockStopEvent|PingEvent// ← new|StreamErrorEvent// ← new
Fixes Claude Code #46987 (and #33949, #39755 in the same family) — Claude Code's watchdog can reset on pings; 90s default becomes safe again; no need to push users to 5-10min timeouts.
Zero breaking changes for consumers that were already ignoring unknown event types. Consumers that want the new behavior opt in via discriminated-union exhaustiveness.
A plain { type: 'ping', timestamp } event already solves the liveness problem. But pings could carry what the server is actually doing, which would unlock much better UX and smarter client-side decisions:
exporttypePingEvent={type: 'ping'timestamp: numberstatus: 'queued'|'thinking'|'tool_executing'|'generating'|'rate_limited'queuePosition?: number// when status === 'queued'estimatedStartMs?: number// when status === 'queued'thinkingDepth?: number// when status === 'thinking' — tokens spent so fartoolName?: string// when status === 'tool_executing'}
Why this matters:
Honest UX, not fake reassurance: today every long delay renders as the same "still working...". A queued request (no work happening yet) looks identical to an Opus that's been thinking for 90 seconds. Users can't tell the difference and blame the client. With status: 'queued', queuePosition: 47, Claude Code could show "Waiting in queue — 47 ahead of you, ~30s".
Worth calling out: Claude Code v2.1.98+ added three reassuring "still working" UI messages that fire on pure setTimeout:
// cli.js v2.1.104, ~offset 12,702,600k35=[{afterMs: 30000,text: "Thinking a bit longer… still working on it…"},{afterMs: 90000,text: "This is a harder one… it might take a few more minutes…"},{afterMs: 270000,text: "Hang tight… really working through this one…"}]
These are not health checks. They are plain timers that fire regardless of whether the server is actually alive, queued, rate-limited, or silently hung. If the API dropped 20 minutes ago, the user still sees "Hang tight… really working through this one…" at the 4:30 mark — an implicit lie. This is the opposite of what pings are for. Semantic ping events would let the client show the real state, and the reassurance messages would only appear when there's actually activity to reassure about.
Smarter watchdogs: idle threshold should depend on what the server is doing. A 90s idle while status === 'queued' means nothing (queue could be long). The same 90s while status === 'generating' is a real stall. One timeout value for all states will always be either too aggressive or too lax.
Honest rate limit backpressure: today rate limits surface as 429s only after the client has sent a request and the server has decided to reject it. A status: 'rate_limited' ping event during a queued request would let the client back off earlier, with the actual limit name (see #46987's workaround discussion about tuning timeouts blindly).
Deterministic cancellation: users pressing ESC during a status: 'queued' request cost zero tokens. Users pressing ESC during status: 'generating' forfeit partial output. Telling them which state they're in turns a guess into a decision.
Debugging: today when a stream hangs, the only information is "no chunks for N seconds". With semantic pings the support flow changes from "reproduce it and send us a session ID" to "last ping was status: 'thinking', thinkingDepth: 8192, then silence for 120s" — that's an actual breadcrumb.
Adaptive thresholds instead of fixed timeouts — the honest approach: today every client picks one idle threshold and applies it globally. Anthropic's own SDK/CLI picks 90s. Users hitting partial response received are told to raise it to 300-600s (#46987). This punishes everyone for the worst case: fast requests wait too long on real hangs, slow requests false-positive on normal behavior.
A truly honest client should scale its threshold with the server's current state, which it can only know if pings carry that state:
functionadaptiveIdleTimeout(lastPing: PingEvent): number{switch(lastPing.status){case'queued':
// Queue time is not our problem. Let the server tell us when it's our turn.// Only abort if we stop receiving pings at all.return5*60_000case'thinking':
// Extended thinking on Opus can legitimately take minutes on hard problems.// Scale with depth: deeper thinking = longer budget.returnMath.max(60_000,lastPing.thinkingDepth*15)case'tool_executing':
// Server-side tool (web_search, advisor) — unknown upper bound, trust pings.return3*60_000case'generating':
// Active token output — any silence over 30s is almost certainly a stall.return30_000case'rate_limited':
// Don't abort at all, let the server's retry-after drive the wait.returnInfinity}}
This is the only way to give generating requests fast failure (30s) while also tolerating 5-minute queue waits. A single global threshold cannot do both. The adaptive approach is honest because it uses whatever information the server chooses to share, instead of pretending the server state doesn't exist.
Server load also matters globally, not just per-request. If the rate-limit ping or a new status: 'server_degraded' marker indicates API-wide backpressure, every request's threshold can auto-relax without the user having to manually tune env vars during an incident. Today when Anthropic has an outage, users worldwide simultaneously edit CLAUDE_STREAM_IDLE_TIMEOUT_MS — that's a bug, not a workaround.
Server-driven thresholds — stop hardcoding magic numbers in clients: the adaptive example above is still wrong in one important way — it hardcodes the thresholds (30_000, 60_000, 3 * 60_000) in client code. Even with perfect semantic pings, every SDK in every language would re-implement the same numbers, drift from each other, and require coordinated updates when server characteristics change.
The right place for those numbers is inside the ping payload itself:
exporttypePingEvent={type: 'ping'timestamp: numberstatus: 'queued'|'thinking'|'tool_executing'|'generating'|'rate_limited'nextPingWithinMs: number// ← server tells the client when to expect the next ping// ... other semantic fields}
Client logic reduces to one line:
// Abort if we don't see another ping/event within server-declared budget × safety factorconstthreshold=lastPing.nextPingWithinMs*1.5
Why this is strictly better:
Single source of truth: Anthropic owns the numbers. No client can be wrong.
Per-model, per-endpoint, per-region: Opus on a cold backend can send nextPingWithinMs: 120_000, Haiku on a hot one 30_000, Bedrock Sonnet whatever AWS's infrastructure requires. All from the same SDK code.
Live incident response: during an outage the server can widen budgets during active requests. No config reloads, no restarts, no tribal knowledge about which env var to tune.
A/B testing & deprecation: Anthropic can roll out new model behaviors (e.g. Opus 4.7 with deeper thinking) without bumping every client. Just bump nextPingWithinMs from the server.
Fair to diverse hardware: users on slow networks get longer budgets automatically via TCP/path characteristics surfacing in the server's own measurements.
The pattern is proven elsewhere: HTTP/2 SETTINGS_INITIAL_WINDOW_SIZE, WebSocket Ping/Pong with server-specified intervals, Kafka heartbeat.interval.ms sent via broker metadata, DNS TTL. Every one of these puts the timing decision on the entity that actually knows the answer (the server/protocol/service), not on every client individually guessing.
Minimal server change: one extra field in the SSE ping data payload. No new endpoints, no protocol version bump, no API surface additions — just use the bytes that are already on the wire.
Backwards compatibility: status fields are optional. A server that doesn't emit them still sends plain pings. Consumers that don't care just look at type === 'ping' and reset their timer.
Server-side change required: this is not purely an SDK fix. The SSE stream protocol itself needs to ship status in the data: payload of ping events. Today pings are empty event: ping\ndata: {}\n\n. Extending to event: ping\ndata: {"status":"thinking"}\n\n is backwards-compatible with any existing parser.
We understand this is a larger ask than just "forward the ping". We're including it because the ping event is the natural carrier for this information, and adding it later (after SDKs standardize on silent ping) will be harder than getting it right the first time. At minimum, please consider reserving the data payload in the SSE protocol spec for future semantic fields, even if the SDK change lands as a plain { type: 'ping' } first.
conststream=client.messages.stream({model: 'claude-opus-4-6',// ...},{idleTimeoutMs: 90_000,// ← resets on ANY event including ping})
Behind the scenes: a setTimeout reset on every yielded event (including the new ping events). When it fires: abort the AbortController and throw a typed StreamIdleTimeoutError.
This composes cleanly — users who don't want a timeout just don't pass the option. Users who have their own watchdog (like Claude Code) can still build it themselves because pings are now visible.
Why this matters (and why the architecture is wrong as-is)
Currently:
SDK: MIT-licensed, auto-generated by Stainless, bare-bones HTTP wrapper, no reliability logic.
Claude Code cli.js: 13MB of minified closed-source code that re-implements retry, fallback, timeout, watchdog, and recovery on top of the thin SDK.
Every reliability fix has to happen twice: once in Stainless (hard), and once in the bundled cli.js (also hard, because it's minified, unreviewable, and ships multiple versions per day). The result is that the public SDK stays bare-bones while cli.js collects workarounds and regressions — v2.1.104's partial response received error is a recent example of a "fix" that made the user experience strictly worse for a real class of requests.
Reliability primitives belong in the SDK, not in closed application code. Starting with ping forwarding is the smallest, lowest-risk step toward that.
Proposal: Ping-aware Streaming Watchdog in SDK (fixes #749, #867, and upstream Claude Code hangs)
Thesis
The client should adapt to the server and current load — not guess. And the adapting should live in the SDK, not in every application built on top of it.
Every reliability problem in this family (#749, #867, #46987, #33949, #39755, #25979) traces back to the same two architectural mistakes:
Clients are forced to guess. They pick fixed timeout numbers with zero information about what the server is actually doing. Those numbers are always wrong — too aggressive for Opus thinking, too lax for real hangs, blind to queues, blind to outages, and scattered across every SDK and every client reimplementing the same heuristics from scratch.
Reliability logic lives in the wrong layer. Today the public, MIT-licensed SDK is a bare-bones Stainless-generated HTTP wrapper with zero idle-timeout, retry, fallback, or ping handling. Meanwhile Claude Code's closed-source 13 MB
cli.jsre-implements all of those primitives from scratch — retry wrappers, watchdogs, fallback paths, silent 529 recovery, partial-yield throws, telemetry events, kill switches. Every fix has to happen twice, in two places that don't share code, one of which is unreviewable and ships multiple minified versions per day. The v2.1.104partial response receivedregression exists because this duplication made it easy to ship a user-hostile change in cli.js without anyone being able to review it.The SDK should own every reliability primitive. That means: streaming with built-in idle-timeout, ping-aware watchdog, retry with backoff, automatic fallback to non-streaming, 529 handling, request-id correlation, rate-limit awareness, graceful abort propagation. All MIT-licensed, all auditable, all shared across every language SDK and every downstream application.
Claude Code (and every other client) should contain pure business logic. Tools, permissions, prompt construction, UI rendering, agent coordination, session storage, slash commands. None of them should touch HTTP retry loops or setTimeout-based watchdogs ever again. If Claude Code calls
sdk.messages.stream()and it takes too long, that is the SDK's problem to detect, explain, and recover from.Clients can override individual behaviors where genuinely necessary — but the contracts live in the SDK. This is not "SDK is a black box you can't touch." It's "SDK defines every reliability primitive as an extensible interface; clients plug in different implementations of specific hooks without forking the contract." A concrete shape:
The principle: clients can replace the implementation of any reliability primitive, but they cannot invent new contracts.
RetryStrategy,FallbackStrategy,WatchdogPolicy,RateLimitHandler— all defined in the SDK as public interfaces, with sensible defaults. A client that needs something unusual (telemetry injection, custom UI during retry, organization-specific backoff) swaps the implementation. It does not rewritefor awaitloops and setTimeout watchdogs from scratch.This matters because today Claude Code has already fragmented the contract. Claude Code's watchdog, fallback, retry, rate-limit handling — none of them have interfaces anyone else can see, let alone reuse. The moment any third-party application wants the same reliability guarantees as Claude Code, they have to start from zero and read 13 MB of minified JavaScript. That is why the same class of bugs keeps resurfacing across the ecosystem.
The server knows what it's doing. The server knows its current load. The server knows how long the next ping will take. Stop making the client guess, and stop making every application built on top of Anthropic's API reinvent the wheel poorly.
TL;DR
Anthropic's SSE stream emits
pingevents every ~15-30s as a liveness signal, but the TypeScript SDK silently drops them (if (event === 'ping') continue). This prevents any downstream consumer — including Claude Code CLI — from distinguishing "server is actively thinking" from "stream is silently stalled".Result: every reliability watchdog built on top of the SDK suffers false-positive aborts during long model thinking (Opus with high effort can go silent for 60-120s between text deltas), and real hangs are still indistinguishable from normal operation.
Proposed fix, in four evolutionary steps (each strictly better, each backwards-compatible with the previous):
{ type: 'ping', timestamp }.status('queued' / 'thinking' / 'tool_executing' / 'generating' / 'rate_limited').nextPingWithinMs, client just respects it. No hardcoded numbers anywhere.Step 1 alone fixes #749 (API completeness), #867 (SDK-level streaming idle timeout), and the upstream regression flooding Claude Code's #46987 (40+ reports in 24h) caused by the v2.1.104
partial response receivederror path. Steps 2-4 eliminate the class of problem entirely.Evidence
1. SDK currently drops ping events
@anthropic-ai/sdk,src/core/streaming.ts(around lines 78-84):Pings are silently consumed. Errors are surfaced as thrown
APIError, which is correct behavior, but neither event ever reaches the consumer as a streamed event object — so downstream code can't tell when the last proof-of-life arrived.@RobertCraigie in #749 (collaborator response, May 2025):
@manucorporat in the same thread:
The issue has been open for 11 months with no action.
2. Claude Code hits this exact problem every day
Claude Code implements its own streaming watchdog in closed-source
cli.js(env:CLAUDE_ENABLE_STREAM_WATCHDOG=1). Because it consumes the SDK's async iterator, it cannot see ping events. The watchdog resets only on yielded events (content deltas), so any thinking-silent period >CLAUDE_STREAM_IDLE_TIMEOUT_MS(default 90s) triggers an abort.Reverse-engineered watchdog loop (cli.js v2.1.104, ~offset 11,856,400):
Before v2.1.104 the abort silently fell back to non-streaming mode (users wasted 2× tokens but didn't see errors). In v2.1.104 the fallback was removed when partial data has been received, replaced with:
Now false-positive watchdog aborts surface as hard errors with no recovery path.
3. Today's regression wave — #46987 (40+ reports)
anthropics/claude-code#46987
Users on Opus + extended thinking are getting repeated
API Error: Stream idle timeout - partial response receivedon retry, consistently unrecoverable without a process restart. Community root-cause (@gabrimatic):Workarounds suggested in the thread:
CLAUDE_STREAM_IDLE_TIMEOUT_MSto 300-600s → catch hangs 5-10× slowerBoth are band-aids. Neither addresses the root cause.
4. Our own logs confirm the false-positive pattern
From a naga debugging session (2026-04-13, Claude Code v2.1.104):
Three consecutive retries, three different
x-client-request-id, all abort at exactly 90s into streaming. Process restart with an otherwise-identical conversation resumed streaming normally. The server was sending pings the whole time — we just couldn't see them.5. #867 independently reached the same conclusion via a different path
#867 (December 2025)
@notactuallytreyanastasio documented a real Opus 4.5 hang where the thinking block completed with
output_tokens: 4,stop_reason: null, and the stream never delivered the next event. Proposed adding an SDK-level streaming idle timeout.Problem with a plain idle timeout: it hits the same false-positive wall as Claude Code's watchdog. Without ping awareness, you can't distinguish "thinking" from "hung" — you can only pick a threshold and hope.
Proposal
Change in
@anthropic-ai/sdk(src/core/streaming.ts)Add to
RawMessageStreamEventunion:Impact
RawMessageStreamEventbecomes complete and type-safe.pingas a standard event. The SDK hiding it is a silent contract violation.Going further: semantic ping events
A plain
{ type: 'ping', timestamp }event already solves the liveness problem. But pings could carry what the server is actually doing, which would unlock much better UX and smarter client-side decisions:Why this matters:
Honest UX, not fake reassurance: today every long delay renders as the same "still working...". A queued request (no work happening yet) looks identical to an Opus that's been thinking for 90 seconds. Users can't tell the difference and blame the client. With
status: 'queued', queuePosition: 47, Claude Code could show "Waiting in queue — 47 ahead of you, ~30s".Worth calling out: Claude Code v2.1.98+ added three reassuring "still working" UI messages that fire on pure
setTimeout:These are not health checks. They are plain timers that fire regardless of whether the server is actually alive, queued, rate-limited, or silently hung. If the API dropped 20 minutes ago, the user still sees "Hang tight… really working through this one…" at the 4:30 mark — an implicit lie. This is the opposite of what pings are for. Semantic ping events would let the client show the real state, and the reassurance messages would only appear when there's actually activity to reassure about.
Smarter watchdogs: idle threshold should depend on what the server is doing. A 90s idle while
status === 'queued'means nothing (queue could be long). The same 90s whilestatus === 'generating'is a real stall. One timeout value for all states will always be either too aggressive or too lax.Honest rate limit backpressure: today rate limits surface as 429s only after the client has sent a request and the server has decided to reject it. A
status: 'rate_limited'ping event during a queued request would let the client back off earlier, with the actual limit name (see #46987's workaround discussion about tuning timeouts blindly).Deterministic cancellation: users pressing ESC during a
status: 'queued'request cost zero tokens. Users pressing ESC duringstatus: 'generating'forfeit partial output. Telling them which state they're in turns a guess into a decision.Debugging: today when a stream hangs, the only information is "no chunks for N seconds". With semantic pings the support flow changes from "reproduce it and send us a session ID" to "last ping was
status: 'thinking', thinkingDepth: 8192, then silence for 120s" — that's an actual breadcrumb.Adaptive thresholds instead of fixed timeouts — the honest approach: today every client picks one idle threshold and applies it globally. Anthropic's own SDK/CLI picks 90s. Users hitting
partial response receivedare told to raise it to 300-600s (#46987). This punishes everyone for the worst case: fast requests wait too long on real hangs, slow requests false-positive on normal behavior.A truly honest client should scale its threshold with the server's current state, which it can only know if pings carry that state:
This is the only way to give
generatingrequests fast failure (30s) while also tolerating 5-minute queue waits. A single global threshold cannot do both. The adaptive approach is honest because it uses whatever information the server chooses to share, instead of pretending the server state doesn't exist.Server load also matters globally, not just per-request. If the rate-limit ping or a new
status: 'server_degraded'marker indicates API-wide backpressure, every request's threshold can auto-relax without the user having to manually tune env vars during an incident. Today when Anthropic has an outage, users worldwide simultaneously editCLAUDE_STREAM_IDLE_TIMEOUT_MS— that's a bug, not a workaround.Server-driven thresholds — stop hardcoding magic numbers in clients: the adaptive example above is still wrong in one important way — it hardcodes the thresholds (
30_000,60_000,3 * 60_000) in client code. Even with perfect semantic pings, every SDK in every language would re-implement the same numbers, drift from each other, and require coordinated updates when server characteristics change.The right place for those numbers is inside the ping payload itself:
Client logic reduces to one line:
Why this is strictly better:
nextPingWithinMs: 120_000, Haiku on a hot one30_000, Bedrock Sonnet whatever AWS's infrastructure requires. All from the same SDK code.nextPingWithinMsfrom the server.nextPingWithinMs × 1.5"). Currently every client implements its own heuristic, which is why #46987, bug/proposal: infinitely hanging clients breaking bigger/complex sessions. Proposal: add Streaming Idle Timeout to Prevent Indefinite Hangs #867, #39755, #25979, #33949 all exist.The pattern is proven elsewhere: HTTP/2
SETTINGS_INITIAL_WINDOW_SIZE, WebSocketPing/Pongwith server-specified intervals, Kafkaheartbeat.interval.mssent via broker metadata, DNSTTL. Every one of these puts the timing decision on the entity that actually knows the answer (the server/protocol/service), not on every client individually guessing.Minimal server change: one extra field in the SSE
pingdata payload. No new endpoints, no protocol version bump, no API surface additions — just use the bytes that are already on the wire.Backwards compatibility: status fields are optional. A server that doesn't emit them still sends plain pings. Consumers that don't care just look at
type === 'ping'and reset their timer.Server-side change required: this is not purely an SDK fix. The SSE stream protocol itself needs to ship status in the
data:payload of ping events. Today pings are emptyevent: ping\ndata: {}\n\n. Extending toevent: ping\ndata: {"status":"thinking"}\n\nis backwards-compatible with any existing parser.We understand this is a larger ask than just "forward the ping". We're including it because the ping event is the natural carrier for this information, and adding it later (after SDKs standardize on silent ping) will be harder than getting it right the first time. At minimum, please consider reserving the
datapayload in the SSE protocol spec for future semantic fields, even if the SDK change lands as a plain{ type: 'ping' }first.Optional follow-up: built-in idle-timeout helper
Given #867's proposal, a natural add-on:
Behind the scenes: a setTimeout reset on every yielded event (including the new
pingevents). When it fires: abort the AbortController and throw a typedStreamIdleTimeoutError.This composes cleanly — users who don't want a timeout just don't pass the option. Users who have their own watchdog (like Claude Code) can still build it themselves because pings are now visible.
Why this matters (and why the architecture is wrong as-is)
Currently:
Every reliability fix has to happen twice: once in Stainless (hard), and once in the bundled cli.js (also hard, because it's minified, unreviewable, and ships multiple versions per day). The result is that the public SDK stays bare-bones while cli.js collects workarounds and regressions — v2.1.104's
partial response receivederror is a recent example of a "fix" that made the user experience strictly worse for a real class of requests.Reliability primitives belong in the SDK, not in closed application code. Starting with ping forwarding is the smallest, lowest-risk step toward that.
References
src/streaming.ts#L65-L69— the drop sitepingexplicitly)