Proposal: Ping-aware streaming watchdog (fixes #749, #867, and downstream Claude Code hangs)

# Proposal: Ping-aware Streaming Watchdog in SDK (fixes #749, #867, and upstream Claude Code hangs)

## Thesis

**The client should adapt to the server and current load — not guess. And the adapting should live in the SDK, not in every application built on top of it.**

Every reliability problem in this family (#749, #867, #46987, #33949, #39755, #25979) traces back to the same two architectural mistakes:

1. **Clients are forced to guess.** They pick fixed timeout numbers with zero information about what the server is actually doing. Those numbers are always wrong — too aggressive for Opus thinking, too lax for real hangs, blind to queues, blind to outages, and scattered across every SDK and every client reimplementing the same heuristics from scratch.

2. **Reliability logic lives in the wrong layer.** Today the public, MIT-licensed SDK is a bare-bones Stainless-generated HTTP wrapper with zero idle-timeout, retry, fallback, or ping handling. Meanwhile Claude Code's closed-source 13 MB `cli.js` re-implements all of those primitives from scratch — retry wrappers, watchdogs, fallback paths, silent 529 recovery, partial-yield throws, telemetry events, kill switches. Every fix has to happen twice, in two places that don't share code, one of which is unreviewable and ships multiple minified versions per day. The v2.1.104 `partial response received` regression exists because this duplication made it easy to ship a user-hostile change in cli.js without anyone being able to review it.

**The SDK should own every reliability primitive.** That means: streaming with built-in idle-timeout, ping-aware watchdog, retry with backoff, automatic fallback to non-streaming, 529 handling, request-id correlation, rate-limit awareness, graceful abort propagation. All MIT-licensed, all auditable, all shared across every language SDK and every downstream application.

**Claude Code (and every other client) should contain pure business logic.** Tools, permissions, prompt construction, UI rendering, agent coordination, session storage, slash commands. None of them should touch HTTP retry loops or setTimeout-based watchdogs ever again. If Claude Code calls `sdk.messages.stream()` and it takes too long, that is the SDK's problem to detect, explain, and recover from.

**Clients can override individual behaviors where genuinely necessary — but the contracts live in the SDK.** This is not "SDK is a black box you can't touch." It's "SDK defines every reliability primitive as an extensible interface; clients plug in different implementations of specific hooks without forking the contract." A concrete shape:

```typescript
client.messages.stream(params, {
  // opt-in behaviors, strongly-typed, all optional
  idleTimeout: { respectServerBudget: true, safetyFactor: 1.5 },
  retry: { maxAttempts: 3, backoff: 'exponential' },
  fallback: { onWatchdogAbort: 'non-streaming' },
  
  // override hooks with typed interfaces — not string-matched reimplementations
  onRateLimited: (info) => { /* custom UI */ },
  onStreamStalled: (ctx) => ctx.retry({ strategy: 'cached' }),
  onPing: (ping) => { /* custom telemetry */ },
  
  // replace whole primitives via DI — contract stays the SDK's
  retryStrategy: customRetryStrategy,    // implements interface RetryStrategy
  fallbackStrategy: customFallback,      // implements interface FallbackStrategy
})
```

The principle: **clients can replace the *implementation* of any reliability primitive, but they cannot invent new contracts**. `RetryStrategy`, `FallbackStrategy`, `WatchdogPolicy`, `RateLimitHandler` — all defined in the SDK as public interfaces, with sensible defaults. A client that needs something unusual (telemetry injection, custom UI during retry, organization-specific backoff) swaps the implementation. It does not rewrite `for await` loops and setTimeout watchdogs from scratch.

This matters because today Claude Code has *already* fragmented the contract. Claude Code's watchdog, fallback, retry, rate-limit handling — none of them have interfaces anyone else can see, let alone reuse. The moment any third-party application wants the same reliability guarantees as Claude Code, they have to start from zero and read 13 MB of minified JavaScript. That is why the same class of bugs keeps resurfacing across the ecosystem.

The server knows what it's doing. The server knows its current load. The server knows how long the next ping will take. Stop making the client guess, and stop making every application built on top of Anthropic's API reinvent the wheel poorly.

## TL;DR

Anthropic's SSE stream emits `ping` events every ~15-30s as a liveness signal, but the TypeScript SDK **silently drops them** (`if (event === 'ping') continue`). This prevents any downstream consumer — including Claude Code CLI — from distinguishing **"server is actively thinking"** from **"stream is silently stalled"**.

Result: every reliability watchdog built on top of the SDK suffers false-positive aborts during long model thinking (Opus with high effort can go silent for 60-120s between text deltas), **and** real hangs are still indistinguishable from normal operation.

**Proposed fix, in four evolutionary steps** (each strictly better, each backwards-compatible with the previous):

1. **Forward pings** — stop dropping them, yield as `{ type: 'ping', timestamp }`.
2. **Semantic pings** — add `status` ('queued' / 'thinking' / 'tool_executing' / 'generating' / 'rate_limited').
3. **Adaptive client thresholds** — different timeouts per state instead of one global magic number.
4. **Server-driven thresholds** — server tells the client `nextPingWithinMs`, client just respects it. No hardcoded numbers anywhere.

Step 1 alone fixes #749 (API completeness), #867 (SDK-level streaming idle timeout), and the upstream regression flooding Claude Code's #46987 (40+ reports in 24h) caused by the v2.1.104 `partial response received` error path. Steps 2-4 eliminate the class of problem entirely.

---

## Evidence

### 1. SDK currently drops ping events

`@anthropic-ai/sdk`, `src/core/streaming.ts` (around lines 78-84):

```typescript
for await (const sse of iterator) {
  // ... parse sse.data into Item and yield
  if (sse.event === 'ping') {
    continue;                              // ← dropped, never yielded
  }
  if (sse.event === 'error') {
    throw new APIError(undefined, safeJSON(sse.data) ?? sse.data,
                       undefined, response.headers);
  }
}
```

Pings are silently consumed. Errors are surfaced as thrown `APIError`, which is correct behavior, but neither event ever reaches the consumer as a streamed event object — so downstream code can't tell when the last proof-of-life arrived.

**@RobertCraigie** in #749 (collaborator response, May 2025):
> "We don't include ping or error because the SDK will never yield those events."

**@manucorporat** in the same thread:
> "I think it's still valuable to receive the ping events in the async iterator, it's useful for detecting if the connection is still alive during a long tool call, and distinguish from the API hanging, and not emitting more tokens."

The issue has been open for **11 months** with no action.

### 2. Claude Code hits this exact problem every day

Claude Code implements its own streaming watchdog in closed-source `cli.js` (env: `CLAUDE_ENABLE_STREAM_WATCHDOG=1`). Because it consumes the SDK's async iterator, it **cannot** see ping events. The watchdog resets only on yielded events (content deltas), so any thinking-silent period > `CLAUDE_STREAM_IDLE_TIMEOUT_MS` (default 90s) triggers an abort.

**Reverse-engineered watchdog loop** (cli.js v2.1.104, ~offset 11,856,400):

```javascript
for await (let event of stream) {
  resetStreamIdleTimer()  // resets only on actual content events
  // ... process event
}
```

Before v2.1.104 the abort silently fell back to non-streaming mode (users wasted 2× tokens but didn't see errors). **In v2.1.104 the fallback was removed when partial data has been received**, replaced with:

```javascript
if (N6.length > 0) throw Error("Stream idle timeout - partial response received")
```

Now false-positive watchdog aborts surface as hard errors with no recovery path.

### 3. Today's regression wave — #46987 (40+ reports)

https://github.com/anthropics/claude-code/issues/46987

Users on Opus + extended thinking are getting repeated `API Error: Stream idle timeout - partial response received` on retry, **consistently unrecoverable without a process restart**. Community root-cause (@gabrimatic):

> "The culprit is `CLAUDE_ENABLE_STREAM_WATCHDOG=1`. It kills the stream after 90 seconds of no data — but with Opus and high effort thinking, the model can easily go silent for longer than that while reasoning."

Workarounds suggested in the thread:
- Disable the watchdog → lose all hang protection
- Raise `CLAUDE_STREAM_IDLE_TIMEOUT_MS` to 300-600s → catch hangs 5-10× slower

**Both are band-aids. Neither addresses the root cause.**

### 4. Our own logs confirm the false-positive pattern

From a naga debugging session (2026-04-13, Claude Code v2.1.104):

```
12:22:47 [WARN]  Streaming idle warning: no chunks received for 45s
12:23:32 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:23:32 [ERROR] Error in API request: Stream idle timeout - partial response received
12:25:21 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:25:21 [ERROR] Error in API request: Stream idle timeout - partial response received
12:30:12 [ERROR] Streaming idle timeout: no chunks received for 90s, aborting stream
12:30:12 [ERROR] Error in API request: Stream idle timeout - partial response received
```

Three consecutive retries, three different `x-client-request-id`, all abort at exactly 90s into streaming. Process restart with an otherwise-identical conversation resumed streaming normally. **The server was sending pings the whole time** — we just couldn't see them.

### 5. #867 independently reached the same conclusion via a different path

https://github.com/anthropics/anthropic-sdk-typescript/issues/867 (December 2025)

@notactuallytreyanastasio documented a real Opus 4.5 hang where the thinking block completed with `output_tokens: 4`, `stop_reason: null`, and the stream never delivered the next event. Proposed adding an SDK-level streaming idle timeout.

**Problem with a plain idle timeout**: it hits the same false-positive wall as Claude Code's watchdog. Without ping awareness, you can't distinguish "thinking" from "hung" — you can only pick a threshold and hope.

---

## Proposal

### Change in `@anthropic-ai/sdk` (`src/core/streaming.ts`)

```typescript
// Before (around lines 78-84)
if (sse.event === 'ping') {
  continue;
}
if (sse.event === 'error') {
  throw new APIError(undefined, safeJSON(sse.data) ?? sse.data, undefined, response.headers);
}

// After — yield ping as a typed event so consumers can reset idle timers.
// Error behavior is preserved (still thrown) because errors already have
// a proper APIError path; we just make pings observable.
if (sse.event === 'ping') {
  yield { type: 'ping', timestamp: Date.now() } as Item;
  continue;
}
if (sse.event === 'error') {
  throw new APIError(undefined, safeJSON(sse.data) ?? sse.data, undefined, response.headers);
}
```

Add to `RawMessageStreamEvent` union:

```typescript
export type PingEvent = { type: 'ping'; timestamp: number }
export type StreamErrorEvent = { type: 'error'; error: APIError }

export type RawMessageStreamEvent =
  | MessageStartEvent
  | MessageDeltaEvent
  | MessageStopEvent
  | ContentBlockStartEvent
  | ContentBlockDeltaEvent
  | ContentBlockStopEvent
  | PingEvent            // ← new
  | StreamErrorEvent     // ← new
```

### Impact

1. **Fixes #749** — `RawMessageStreamEvent` becomes complete and type-safe.
2. **Fixes #867** — any SDK-level idle timeout can now reset on ping events, eliminating the false-positive vs. real-hang ambiguity.
3. **Fixes Claude Code #46987** (and #33949, #39755 in the same family) — Claude Code's watchdog can reset on pings; 90s default becomes safe again; no need to push users to 5-10min timeouts.
4. **Zero breaking changes** for consumers that were already ignoring unknown event types. Consumers that want the new behavior opt in via discriminated-union exhaustiveness.
5. **Symmetric with Anthropic's own documentation**: https://docs.anthropic.com/en/api/messages-streaming#event-types explicitly lists `ping` as a standard event. The SDK hiding it is a silent contract violation.

### Going further: semantic ping events

A plain `{ type: 'ping', timestamp }` event already solves the liveness problem. But pings could carry **what the server is actually doing**, which would unlock much better UX and smarter client-side decisions:

```typescript
export type PingEvent = {
  type: 'ping'
  timestamp: number
  status: 'queued' | 'thinking' | 'tool_executing' | 'generating' | 'rate_limited'
  queuePosition?: number     // when status === 'queued'
  estimatedStartMs?: number  // when status === 'queued'
  thinkingDepth?: number     // when status === 'thinking' — tokens spent so far
  toolName?: string          // when status === 'tool_executing'
}
```

**Why this matters:**

1. **Honest UX, not fake reassurance**: today every long delay renders as the same "still working...". A queued request (no work happening yet) looks identical to an Opus that's been thinking for 90 seconds. Users can't tell the difference and blame the client. With `status: 'queued', queuePosition: 47`, Claude Code could show "Waiting in queue — 47 ahead of you, ~30s".

   Worth calling out: Claude Code v2.1.98+ added three reassuring "still working" UI messages that fire on pure `setTimeout`:

   ```javascript
   // cli.js v2.1.104, ~offset 12,702,600
   k35 = [
     { afterMs: 30000,  text: "Thinking a bit longer… still working on it…" },
     { afterMs: 90000,  text: "This is a harder one… it might take a few more minutes…" },
     { afterMs: 270000, text: "Hang tight… really working through this one…" }
   ]
   ```

   **These are not health checks.** They are plain timers that fire regardless of whether the server is actually alive, queued, rate-limited, or silently hung. If the API dropped 20 minutes ago, the user still sees "Hang tight… really working through this one…" at the 4:30 mark — an implicit lie. This is the opposite of what pings are for. Semantic ping events would let the client show the *real* state, and the reassurance messages would only appear when there's actually activity to reassure about.

2. **Smarter watchdogs**: idle threshold should depend on *what the server is doing*. A 90s idle while `status === 'queued'` means nothing (queue could be long). The same 90s while `status === 'generating'` is a real stall. One timeout value for all states will always be either too aggressive or too lax.

3. **Honest rate limit backpressure**: today rate limits surface as 429s only *after* the client has sent a request and the server has decided to reject it. A `status: 'rate_limited'` ping event during a queued request would let the client back off earlier, with the actual limit name (see #46987's workaround discussion about tuning timeouts blindly).

4. **Deterministic cancellation**: users pressing ESC during a `status: 'queued'` request cost zero tokens. Users pressing ESC during `status: 'generating'` forfeit partial output. Telling them which state they're in turns a guess into a decision.

5. **Debugging**: today when a stream hangs, the only information is "no chunks for N seconds". With semantic pings the support flow changes from "reproduce it and send us a session ID" to "last ping was `status: 'thinking', thinkingDepth: 8192`, then silence for 120s" — that's an actual breadcrumb.

6. **Adaptive thresholds instead of fixed timeouts — the honest approach**: today every client picks one idle threshold and applies it globally. Anthropic's own SDK/CLI picks 90s. Users hitting `partial response received` are told to raise it to 300-600s (#46987). This punishes everyone for the worst case: fast requests wait too long on real hangs, slow requests false-positive on normal behavior.

   A truly honest client should **scale its threshold with the server's current state**, which it can only know if pings carry that state:

   ```typescript
   function adaptiveIdleTimeout(lastPing: PingEvent): number {
     switch (lastPing.status) {
       case 'queued':
         // Queue time is not our problem. Let the server tell us when it's our turn.
         // Only abort if we stop receiving pings at all.
         return 5 * 60_000

       case 'thinking':
         // Extended thinking on Opus can legitimately take minutes on hard problems.
         // Scale with depth: deeper thinking = longer budget.
         return Math.max(60_000, lastPing.thinkingDepth * 15)

       case 'tool_executing':
         // Server-side tool (web_search, advisor) — unknown upper bound, trust pings.
         return 3 * 60_000

       case 'generating':
         // Active token output — any silence over 30s is almost certainly a stall.
         return 30_000

       case 'rate_limited':
         // Don't abort at all, let the server's retry-after drive the wait.
         return Infinity
     }
   }
   ```

   This is the only way to give `generating` requests fast failure (30s) *while also* tolerating 5-minute queue waits. A single global threshold cannot do both. The adaptive approach is honest because it uses whatever information the server chooses to share, instead of pretending the server state doesn't exist.

   Server load also matters globally, not just per-request. If the rate-limit ping or a new `status: 'server_degraded'` marker indicates API-wide backpressure, *every* request's threshold can auto-relax without the user having to manually tune env vars during an incident. Today when Anthropic has an outage, users worldwide simultaneously edit `CLAUDE_STREAM_IDLE_TIMEOUT_MS` — that's a bug, not a workaround.

7. **Server-driven thresholds — stop hardcoding magic numbers in clients**: the adaptive example above is still wrong in one important way — it hardcodes the thresholds (`30_000`, `60_000`, `3 * 60_000`) in client code. Even with perfect semantic pings, every SDK in every language would re-implement the same numbers, drift from each other, and require coordinated updates when server characteristics change.

   The right place for those numbers is **inside the ping payload itself**:

   ```typescript
   export type PingEvent = {
     type: 'ping'
     timestamp: number
     status: 'queued' | 'thinking' | 'tool_executing' | 'generating' | 'rate_limited'
     nextPingWithinMs: number  // ← server tells the client when to expect the next ping
     // ... other semantic fields
   }
   ```

   Client logic reduces to one line:

   ```typescript
   // Abort if we don't see another ping/event within server-declared budget × safety factor
   const threshold = lastPing.nextPingWithinMs * 1.5
   ```

   **Why this is strictly better:**

   - **Single source of truth**: Anthropic owns the numbers. No client can be wrong.
   - **Per-model, per-endpoint, per-region**: Opus on a cold backend can send `nextPingWithinMs: 120_000`, Haiku on a hot one `30_000`, Bedrock Sonnet whatever AWS's infrastructure requires. All from the same SDK code.
   - **Live incident response**: during an outage the server can widen budgets *during* active requests. No config reloads, no restarts, no tribal knowledge about which env var to tune.
   - **A/B testing & deprecation**: Anthropic can roll out new model behaviors (e.g. Opus 4.7 with deeper thinking) without bumping every client. Just bump `nextPingWithinMs` from the server.
   - **Fair to diverse hardware**: users on slow networks get longer budgets automatically via TCP/path characteristics surfacing in the server's own measurements.
   - **Easier contract**: clients implement one simple rule ("abort if no event within `nextPingWithinMs × 1.5`"). Currently every client implements its own heuristic, which is why #46987, #867, #39755, #25979, #33949 all exist.

   The pattern is proven elsewhere: HTTP/2 `SETTINGS_INITIAL_WINDOW_SIZE`, WebSocket `Ping/Pong` with server-specified intervals, Kafka `heartbeat.interval.ms` sent via broker metadata, DNS `TTL`. Every one of these puts the timing decision on the entity that actually knows the answer (the server/protocol/service), not on every client individually guessing.

   **Minimal server change**: one extra field in the SSE `ping` data payload. No new endpoints, no protocol version bump, no API surface additions — just use the bytes that are already on the wire.

**Backwards compatibility**: status fields are optional. A server that doesn't emit them still sends plain pings. Consumers that don't care just look at `type === 'ping'` and reset their timer.

**Server-side change required**: this is not purely an SDK fix. The SSE stream protocol itself needs to ship status in the `data:` payload of ping events. Today pings are empty `event: ping\ndata: {}\n\n`. Extending to `event: ping\ndata: {"status":"thinking"}\n\n` is backwards-compatible with any existing parser.

We understand this is a larger ask than just "forward the ping". We're including it because **the ping event is the natural carrier** for this information, and adding it later (after SDKs standardize on silent ping) will be harder than getting it right the first time. At minimum, please consider reserving the `data` payload in the SSE protocol spec for future semantic fields, even if the SDK change lands as a plain `{ type: 'ping' }` first.

### Optional follow-up: built-in idle-timeout helper

Given #867's proposal, a natural add-on:

```typescript
const stream = client.messages.stream({
  model: 'claude-opus-4-6',
  // ...
}, {
  idleTimeoutMs: 90_000,  // ← resets on ANY event including ping
})
```

Behind the scenes: a setTimeout reset on every yielded event (including the new `ping` events). When it fires: abort the AbortController and throw a typed `StreamIdleTimeoutError`.

This composes cleanly — users who don't want a timeout just don't pass the option. Users who have their own watchdog (like Claude Code) can still build it themselves because pings are now visible.

---

## Why this matters (and why the architecture is wrong as-is)

Currently:
- **SDK**: MIT-licensed, auto-generated by Stainless, bare-bones HTTP wrapper, no reliability logic.
- **Claude Code cli.js**: 13MB of minified closed-source code that re-implements retry, fallback, timeout, watchdog, and recovery on top of the thin SDK.

Every reliability fix has to happen twice: once in Stainless (hard), and once in the bundled cli.js (also hard, because it's minified, unreviewable, and ships multiple versions per day). The result is that the public SDK stays bare-bones while cli.js collects workarounds and regressions — v2.1.104's `partial response received` error is a recent example of a "fix" that made the user experience strictly worse for a real class of requests.

**Reliability primitives belong in the SDK**, not in closed application code. Starting with ping forwarding is the smallest, lowest-risk step toward that.

---

## References

- This SDK repo:
  - #749 — "include ping and error in RawMessageStreamEvent" (May 2025, 11 months open)
  - #867 — "Proposal: add Streaming Idle Timeout to Prevent Indefinite Hangs" (Dec 2025)
  - `src/streaming.ts#L65-L69` — the drop site
- Claude Code repo:
  - #46987 — "[BUG] Stream idle timeout - partial response received - multiple time today" (40+ reports, 2026-04-12)
  - #33949 — "SSE streaming hangs indefinitely + ESC cannot fully cancel" (our root-cause analysis, 32 comments)
  - #39755 — "Watchdog fallback dead code + open source request" (our report)
  - #41981 — "Fix proposal: streaming reliability, SDK restructuring, and architectural recommendations"
- Anthropic streaming docs: https://docs.anthropic.com/en/api/messages-streaming#event-types (lists `ping` explicitly)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Ping-aware streaming watchdog (fixes #749, #867, and downstream Claude Code hangs) #998

Proposal: Ping-aware Streaming Watchdog in SDK (fixes #749, #867, and upstream Claude Code hangs)

Thesis

TL;DR

Evidence

1. SDK currently drops ping events

2. Claude Code hits this exact problem every day

3. Today's regression wave — #46987 (40+ reports)

4. Our own logs confirm the false-positive pattern

5. #867 independently reached the same conclusion via a different path

Proposal

Change in `@anthropic-ai/sdk` (`src/core/streaming.ts`)

Impact

Going further: semantic ping events

Optional follow-up: built-in idle-timeout helper

Why this matters (and why the architecture is wrong as-is)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: Ping-aware streaming watchdog (fixes #749, #867, and downstream Claude Code hangs) #998

Description

Proposal: Ping-aware Streaming Watchdog in SDK (fixes #749, #867, and upstream Claude Code hangs)

Thesis

TL;DR

Evidence

1. SDK currently drops ping events

2. Claude Code hits this exact problem every day

3. Today's regression wave — #46987 (40+ reports)

4. Our own logs confirm the false-positive pattern

5. #867 independently reached the same conclusion via a different path

Proposal

Change in @anthropic-ai/sdk (src/core/streaming.ts)

Impact

Going further: semantic ping events

Optional follow-up: built-in idle-timeout helper

Why this matters (and why the architecture is wrong as-is)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Change in `@anthropic-ai/sdk` (`src/core/streaming.ts`)